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ABSTRACT 


Hunting  for  huntingtin  associated  factors:  Identification  and  characterization  of 
huntingtin  expanded  polyglutamine  aggregate-associated  factors  and  their  impact  on 
Huntington  disease  model  cellular  toxicity. 

Maggie  P  Wear,  PhD  2016 

Thesis  directed  by:  Frank  Shewmaker,  PhD,  Assistant  Professor,  Department  of 
Pharmacology 

Gene  mutations  resulting  in  the  formation  of  insoluble  protein  aggregates,  or 

amyloid-like  structures,  have  been  correlated  with  a  number  of  pathological  conditions. 

Huntington’s  disease  (HD)  is  one  of  the  most  prevalent  neurodegenerative  diseases, 

characterized  by  movement,  memory,  behavioral,  and  cognitive  difficulties,  which 

become  more  severe  as  the  disease  progresses .  Protein  aggregation  in  HD  is  caused  by 

expansion  of  the  CAG  repeat  tract,  also  known  as  a  polyglutamine  (polyQ)  expansion,  in 

exon  one  of  the  Huntingtin  protein  (Htt).  Research  has  shown  that  Huntingtin  polyQ  (Htt- 

polyQ)  mutant  expansions  beyond  35  repeats  result  in  protein  aggregation.  These  Htt- 

polyQ  aggregates  form  amyloid-like  inclusions  characterized  by  cross-beta  structure  and 

insolubility;  and  can  result  in  loss  of  Huntingtin  protein  function  or  sequestration  of 

essential  cellular  proteins  that  tightly  interact  with  the  aggregate.  Thus,  proteins 

interacting  with  Htt-polyQ  aggregates  may  mediate  disease  pathogenesis  by  conferring  de 

novo  cytotoxicity  or  by  contributing  to  a  general  loss  of  function.  Determining  how  and 
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why  specific  proteins  interact  with  polyQ  aggregates  is  important  to  understanding 
disease  mechanisms.  We  hypothesized  that  polyQ  aggregates  recruit  and/or  sequester 
proteins  with  common  biophysical  properties  like  size,  charge,  or  conserved  domains.  To 
examine  this,  a  mass  spectrometry -based  approach  for  non-targeted  identification  of 
intracellular  amyloid-forming  and  amyloid-associated  proteins  was  developed.  We 
demonstrated  that  long  intrinsically  disordered  (ID)  domains  -  defined  as  domains  of  > 
100  amino  acids  that  lack  a  defined  structure  -  are  a  common  biophysical  property  shared 
by  proteins  associated  with  Htt-polyQ  aggregates.  Deletional  analysis  was  used  to  show 
that  ID  domains  were  required  for  association  of  two  specific  proteins,  Sgt2,  and  FUS, 
with  the  Htt-polyQ  aggregates.  This  research  has  allowed  the  unbiased  identification  of 
proteins  associated  with  polyQ  aggregates  as  well  as  the  identification  of  ID  domains  as  a 
feature  required  for  the  association  of  specific  proteins  with  these  aggregates.  Recent 
studies  show  that  careful  analysis  of  ID  domain  conformational  ensembles  can  reveal 
drug  target  sites  within  these  unstructured  regions.  Thus,  the  development  of  drugs 
targeting  ID  domains  may  inhibit  cellular  protein  interaction  with  Htt-polyQ  aggregates, 
potentially  reducing  the  neuron  death  associated  with  Huntington’s  disease. 
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CHAPTER  1:  Introduction  and  Background 


Huntington  Disease 

Upwards  of  50  disorders  with  a  disparate  symptomatology  are  correlated  with 
protein  aggregates  (107).  These  protein  aggregates  occur  as  cytoplasmic,  nuclear,  or 
extracellular  deposits  and  have  been  linked  to  disease  pathogenicity  (123;  124;  163;  167; 
195).  The  proteinaceous  deposits  found  in  these  disorders  are  primarily  composed  of  a 
single,  disease-linked  protein,  though  they  are  often  associated  with  other  cellular 
proteins  (53;  120;  127;  198;  226;  240).  A  subset  of  these  disease-related  aggregating 
proteins  have  been  examined  in  depth,  including  A|3  (implicated  in  Alzheimer’s  disease), 
tau  (Frontotemporal  lobar  degeneration,  FTLD),  a-synuclein  (Parkinson’s  disease),  prion 
protein  (PrP)  (Transmissible  spongiform  encephalopathies,  TSEs),  and  the  polyglutamine 
(polyQ)  proteins  huntingtin  (htt,  implicated  in  Huntingoton  Disease,  HD)  and  ataxin-3 
(Spinocerebellar  ataxia  3,  SCA3).  Although  we  know  that  these  proteins  aggregate  in 
neurodegenerative  diseases,  the  role  of  the  precursor  aggregates,  the  end-stage  visable 
aggregates  and  the  aggregate-associated  proteins  in  neuronal  cell  death  remain  unclear. 

The  intra-  or  extra-cellular  accumulation  of  a  number  of  neurodegenerative 
disease  proteins  is  characterized  by  protein  aggregates  and  selective  neuronal  death 
resulting  in  cognitive  and  behavioral  changes  (Table  1).  One  of  the  most  prevalent  of 
these  is  Huntington  Disease.  First  described  by  George  Huntington  in  1872,  HD  is 
characterized  by  spasmodic  movements  of  the  limbs  and  facial  muscles,  which  intensify 
with  disease  progression.  In  the  US,  HD  affects  5-10  in  every  100,000  individuals  with 
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100%  penetrance  when  the  disease  allele  is  present  (191).  With  a  cost  of  $4,000  to 
$30,000  per  patient,  per  year,  in  direct  medical  costs  and  at  least  300,000  patients 
currently  suffering  from  HD  in  the  US  alone,  care  of  these  individuals  exceeds  $5  billion 
per  year  (41;  46).  The  survival  rate  for  HD  patients  is  10-20  years  from  symptom  onset 
and  with  rising  medical  costs  as  the  disease  progresses,  the  average  cost  per  patient  is 
over  $250,000  over  the  course  of  the  disease.  These  are  simply  direct  medical  costs  and 
do  not  reflect  the  emotional  and  other  costs  to  the  patient  and  family.  With  no  cure, 
treatments  for  HD  exclusively  address  symptomatology  and  do  not  provide  prolonged 
survival  for  the  patients. 

The  pathology  of  HD  has  been  linked  to  the  huntingtin  protein,  specifically  the 
amino-terminal  polyglutamine  (polyQ)  region.  Expansion  of  the  glutamine  coding  CAG 
nucleotide  triplet  repeat  beyond  40  repeats  (referred  to  as  Htt-polyQ)  results  in  protein 
aggregation  and  neuronal  cell  death  (167).  Although  many  parts  of  the  brain  are  affected 
in  HD,  the  striatum,  a  subcortical  region  of  the  forebrain,  experiences  the  most  massive 
degeneration.  The  striatum  is  involved  in  movement,  memory,  and  decision-making.  In 
HD,  the  medium  spiny  neurons  of  the  striatum  are  lost,  resulting  in  movement  difficulties 
along  with  cognitive  and  behavioral  changes  (35),  with  loss  of  up  to  30%  of  overall  brain 
weight  over  the  course  of  the  disease  (71;  114).  Disease  progression  is  characterized  by 
loss  of  neurons  resulting  in  deterioration  of  physical,  cognitive,  and  emotional  faculties. 
Most  HD  patients  ultimately  succumb  to  pneumonia,  or  commit  suicide  (41;  166). 

The  mutated  protein  in  HD,  huntingtin,  located  on  chromosome  four,  has  been 
shown  to  interact  with  at  least  100  proteins  in  the  cell,  but  a  distinct  function  for  this 
protein  remains  elusive  (67).  The  remarkable  correlation  between  the  length  of  the 
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polyglutamine  repeat  and  disease  severity  has  lead  researchers  to  conclude  that  the  polyQ 
aggregates  are  toxic  and  causative  in  the  pathophysiology  of  the  disease  (32;  39;  131; 

151;  177;  223;  227).  Recent  studies  have  examined  the  accumulation  of  aggregates  in 
different  types  of  neurons.  Medium  spiny  projection  neurons,  those  lost  in  HD, 
accumulate  less  polyQ  aggregates  than  large  intemeurons,  which  are  mostly  spared  in 
HD  (71;  114).  This  has  led  to  the  speculation  that  aggregates  are  protective  in  these  large 
interneurons  and  other  misfolded  Htt-polyQ  species  such  as  monomers  or  oligomers  are 
responsible  for  the  medium  spiny  neuron  death  (1;  74;  134).  Both  molecular  chaperones 
and  the  ubiquitin  proteasome  system  (UPS)  are  crucial  to  refolding  or  degrading 
misfolded  proteins.  Studies  have  shown  that  heat  shock  and  UPS  proteins  are  trapped  in 
the  insoluble  Htt-polyQ  inclusion  bodies  (223)  and  that  these  aggregates  are  ubiquitinated 
(39;  45).  These  are  just  a  few  of  over  100  huntingtin-interacting  proteins.  A  better 
understanding  of  other  huntingtin-interacting  proteins  and  their  molecular  pathways  may 
be  key  to  understanding  the  Htt-polyQ  toxicity. 

Protein  Aggregation  in  Disease 

As  noted  above,  many  neurodegenerative  diseases  are  characterized  by  insoluble 
protein  aggregates.  These  protein  deposits  were  first  observed  in  patients  who  died  of 
systemic  amyloidosis.  The  aggregates  stained  with  iodine,  which  is  used  to  detect  starch; 
hence  they  were  named  ‘amyloid,’  or  ‘starch-like’  (18;  187).  Since  that  time,  great 
advances  have  been  made  in  characterizing  the  formation,  biophysical  properties,  and 
disease  linkage  of  protein  aggregates. 
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There  are  two  theoretical  mechanisms  of  aggregate  polymerization:  (i)  linear  (aka 
isodesmic)  and  (ii)  nucleation-elongation  polymerization  (57).  The  linear  polymerization 
theory  assumes  that  the  dissociation  constant  for  any  monomer  addition  is  identical. 
Whereas  nucleation-elongation  includes  the  formation  of  a  ‘nucleus’  from  several 
monomers;  which  then  serves  as  a  precursor  structure  increasing  the  favorability  of 
monomer  addition.  It  is  generally  accepted  that  amyloid  forms  by  nucleated-elongation 
polymerization  (1;  25;  29;  76;  88;  98;  108;  145;  173),  although  there  are  some  that 
maintain  there  are  many  precursor  states  that  lead  to  fibrillogenesis  (11;  82;  91;  128;  172; 
245).  For  any  specific  protein  a  number  of  precursor  confirmations  may  exist.  These  are 
referred  to  as  prefibrillar  oligomers,  protofibrils,  oligomers,  or  nuclei,  and  while  the 
confirmations  differ,  these  terms  are  used  somewhat  interchangeably.  Reports  indicate 
that  huntingtin  forms  precursor  aggregates  and  that  these  species  are  key  to  inclusion 
body  formation  (30;  157;  222). 

The  toxic  form  of  amyloid  proteins  has  been  a  matter  of  some  debate. 
Observations  in  the  brain  show  that  while  many  neurons  contain  aggregates,  the  specific 
types  of  neurons  lost  in  HD  do  not  contain  observable  aggregates  (252).  This  has  led 
researchers  to  speculate  that  aggregates  or  inclusions  are  simply  the  by-product  of 
abnormal  protein  accumulation,  or  perhaps  represent  a  protective  event  meant  to 
sequester  abnormal  proteins  (4;  169;  175;  188).  Using  time-lapse  microscopy  to  monitor 
primary  neurons  for  aggregation,  the  laboratory  of  Dr.  S.  Finkbeiner  showed  that  neurons 
containing  visible  aggregates  survived  better  than  those  without  visible  aggregates  (4). 
Furthermore,  fluorescence  resonance  energy  transfer  (FRET)  experiments  in  cells 
overexpressing  polyQ-GFP  fusion  protein  showed  that  soluble  oligomeric  species  appear 
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to  be  the  most  toxic  isoform  (201).  Compounds  which  induce  aggregate  formation  have 
been  shown  to  reduce  toxicity  in  polyQ  cell  models  (13).  Interestingly,  apoptotic  cells  in 
a  zebra  fish  model  expressing  expanded  polyQ  huntingtin  did  not  contain  aggregates 
(178).  Together  these  studies  strongly  suggest  that  soluble  Htt-polyQ  oligomers  are  the 
toxic  mediators  of  cell  death,  prior  to  formation  of  visible  aggregates. 

In  contrast,  a  number  of  other  studies  suggest  that  the  insoluble  polyQ  aggregates 
are,  in  fact,  the  toxic  species.  Examination  of  polyQ  species  from  HD  mouse  and  cell 
culture  models  using  agarose  gel  electrophoresis  indicates  that  oligomers  can  be  detected 
prior  to  cell  toxicity  and  that  aggregate  size  correlates  with  toxicity  (233).  Increased 
ubiquitination,  correlated  with  increased  aggregate  formation,  also  results  in  increased 
cell  toxicity  (42).  It  has  been  observed  that  aggregate  formation  leads  to  cell  death  in 
primary  neurons  (247)  and  PC  12  immortalized  neuronal  cells,  (239)  but  not  in  non¬ 
neuronal  HEK293  cells,  suggesting  that  the  cell  type-dependent  toxicity  may  be  in  part 
the  result  of  the  mitotic  state  of  the  cells.  In  this  model,  cycling  cells  like  HEK293  cells, 
are  able  to  redistribute  the  aggregates  between  mother  and  daughter  cells  thereby 
reducing  the  toxic  effects  to  any  single  cell.  However,  the  majority  of  neuronal  cells  are 
mitotically  inactive,  therefore  any  aggregates  that  accumulate  cannot  be  redistributed, 
resulting  in  subsequent  cell  death. 

One  recent  study  showed  that  properly  folded  polyQ  monomers  are  a-helically 
folded  and  non-toxic,  but  that  misfolded  monomers  or  oligomers  form  (3- sheets  that 
correlate  with  cellular  toxicity  (142).  These  misfolded  proteins,  whether  precursors  or 
aggregated  species,  have  a  higher  propensity  to  engage  in  aberrant  interactions  (193;  194; 
203;  236).  The  huntingtin  protein  interacts  with  between  50  and  200  proteins  forming  a 
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myriad  of  heteroprotein  complexes;  the  delicate  balance  of  these  complexes  is  likely 
altered  by  the  polyQ  expansion  and  subsequent  misfolding  of  huntingtin  protein. 

Although  a  diverse  collection  of  proteins  can  adopt  amyloid  conformation,  the 
structure  of  the  formed  fiber  is  consistent,  as  described  below  (197;  199;  234).  Many 
techniques  have  been  used  to  determine  the  biophysical  properties  of  amyloid  including 
X-ray  diffraction  (53),  Solid  State-Nuclear  Magenteic  Resonance  (NMR)  (3),  electron 
parametric  resonance  spectroscopy  (EPR)  (206),  circular  dichroism  (CD),  Fourier 
transform  infrared  spectroscopy  (FTIR),  and  electron  microscopy  (EM)  (98).  Amyloid 
structure  features  parallel  in-register  (3-strands  running  perpendicular  to  the  fibrillar  axis. 
These  (3-strands  become  laminated  (3-sheets  resulting  in  a  distinctive  cross-|3  rope-like 
structure  when  viewed  by  electron  microscopy  (EM)  (194;  243).  It  is  this  (3-rich  structure 
of  amyloid  that  results  in  strong  resistance  to  degradation,  detergents,  proteolysis,  and 
mechanical  breakage  (49).  Amyloid  exhibits  distinct  tinctorial  properties  that  have  been 
exploited  to  monitor  amyloid  formation  in  many  disease  models;  specifically  staining 
with  thioflavins  (Th)  and  Congo  Red  (CR)  (83;  89;  197;  234). 

Mechanisms  of  polyQ  toxicity 

Although  the  genetic  basis  of  HD  and  other  polyQ  diseases  has  been  clear  for 
many  years,  the  molecular  basis  of  the  disease  still  remains  unclear.  The  underlying  Htt- 
polyQ  toxic  mechanisms  predicted  to  involve  many  cellular  pathways  include 
transcription,  intracellular  transport,  protein  quality  control,  and  mitochondrial  function. 

Transcriptional  dysregulation  is  proposed  to  occur  when  misfolded  Htt-polyQ 
accumulates  in  the  nucleus  of  the  neuron  and  sequesters  the  transcription  factors 
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including  CREB  binding  protein  (CBP),  TATA-binding  protein  (TBP),  and  specificity 
protein  1  (Spl)  (17;  21;  148).  Although  the  mechanisms  are  very  complex,  the  loss  of 
CBP  has  been  linked  to  reduced  Brain-derived  neurotrophic  factor  (BDNF)  mRNA  levels 
in  HD  neurons  (252).  Reduced  BDNF  levels  have  also  been  linked  to  altered  axonal 
transport  (252).  Htt-polyQ  inhibition  of  anterograde  and  retrograde  axonal  transport  is 
polyQ  length-dependent  (70;  200)  and  it  alters  the  activity  of  N-methyl-D-aspartate 
(NMD A)  receptors  leading  to  deregulation  of  Ca2+  homeostasis  via  an  unclear  mechanism 
(201;  202;  249). 

Although  mostly  unexplored,  mitochondrial  dysfunction  has  also  been  correlated 
with  Htt-polyQ.  Mitochondria  in  the  HD  brain  exhibit  abnormal  energy  metabolism 
including  decreased  glucose  metabolism,  increased  lactate  concentration,  and  lower 
membrane  potential  resulting  in  decreased  calcium  necessary  for  mitochondrial 
permeability  transition  overload  (121;  154). 

One  of  the  best-characterized  molecular  pathways  implicated  in  polyQ  diseases  is 
the  ubiquitin  proteasome  system  (UPS).  The  UPS  is  responsible  for  degradation/recycling 
of  misfolded  or  damaged  proteins.  Htt-polyQ  is  ubiquitinated,  the  first  step  in  UPS 
recycling,  indicating  that  the  UPS  recognizes  the  misfolded  state  (9;  45;  210).  Both 
proteasome  subunits  and  chaperone  proteins  are  known  to  associate  with  huntingtin  and 
have  been  found  in  Htt-polyQ  aggregates  (93;  174).  While  this  indicates  that  the  UPS  is 
recruited  to  Htt-polyQ,  it  seems  unable  to  process  polyQ  aggregates.  The  loss  of  function 
of  the  UPS  results  in  the  accumulation  of  misfolded  proteins  -  both  those  associated  with 
Htt-polyQ  and  other  proteins  within  the  cell  -  leading  to  neuronal  cell  death  (9;  65;  156). 
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Association  of  huntingtin-interacting  proteins  with  misfolded  mutant  huntingtin  - 
be  it  monomeric,  oligomeric,  or  in  aggregate  form  -  has  been  shown  to  sequester  other 
cellular  proteins  away  from  their  normal  functions,  leading  to  a  loss-of-function  toxicity 
(24;  40;  74).  This  phenomenon  was  specifically  elucidated  for  BDNF  in  axonal  transport 
described  above.  Furthermore,  Gidalevitz  and  colleagues  showed  that  co-expression  of 
expanded  polyQ  with  temperature-sensitive  proteins  in  C.  elegans  resulted  in  the 
temperature- sensitive  proteins  being  misfolded  at  a  permissive  temperature  (65).  This 
suggested  that  the  misfolding  of  expanded  polyQ  proteins  may  induce  misfolding  of  non- 
associated  proteins. 

It  is  clear  that  no  single  molecular  mechanism  can  fully  explain  the  neuronal 
death  observed  in  HD.  Therefore,  identifying  misfolded  polyQ-associated  proteins  that 
mediate  aggregation  and  toxicity  will  allow  for  improved  understanding  of  HD 
pathogenesis.  One  major  obstacle  in  identifying  these  proteins  is  the  inherent  difficulty  in 
isolating  Htt-polyQ  aggregates.  Our  lab  has  developed  a  method  -  Technique  for 
Amyloid  Purification  and  Identification  or  TAPI  -  to  purify  amyloid  aggregates  in  an 
unbiased  way  that  allows  for  identification  of  aggregating  proteins  as  well  as  proteins  that 
are  strongly  associated  with  the  amyloid  (Figure  1).  We  hypothesize  that  identification  of 
proteins  improved  understanding  of  how  these  proteins  contribute  to  the  disease- 
associated  cellular  toxicity  linked  to  Huntington  disease. 
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STEP  I  Ultra-centrifugation  of  the  cell  lysate  STEP  II  Detergent  treatment 


Aggregate-associated  protein  identification  &  analysis 


Figure  1:  Technique  for  Amyloid  Purification  and  Identification  (TAPI). 

The  TAPI  method  utilizes  sucrose  pad  ultra-centrifugation,  detergent  treatment  and  gel 
electrophoresis  to  purify  aggregates  from  cell  lysates  in  a  stringent,  unbiased  manner. 
Coupling  this  purification  with  mass  spectrometry  allows  for  identification  of  the 
aggregating  protein  and  proteins  trapped  in  the  aggregate. 
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Identification  of  mutant  htt-associated  proteins  is  important  for  defining  the 
molecular  pathways  involved  in  HD  cytotoxicity.  Currently  methods  such  as 
immunoprecipitation  or  deletion  and  overexpression  screening  are  limited  by  protein  bias 
and  the  requirement  for  a  priori  knowledge  about  targeted  factors.  The  TAPI  method  is 
unbiased  and  requires  no  prior  knowledge  about  the  protein  of  interest  (109;  231).  Our 
current  knowledge  of  Htt-polyQ  protein  associations  has  implicated  a  number  of  cellular 
pathways,  but  no  conserved  biophysical  property  or  protein  domain  required  for  a  protein 
to  be  trapped  by  Htt-polyQ  aggregates  has  been  identified. 

Here,  using  the  TAPI  method  we  confirmed  previously  identified  Htt  interacting 
proteins,  thus  showing  that  purification  of  Htt-polyQ  aggregates  can  be  utilized  to 
identify  Htt-interacting  proteins.  We  also  identified  and  confirmed  new  Htt-interacting 
proteins  and  began  exploring  their  role  in  HD  cellular  toxicity.  Bioinformatic  analysis  of 
T API-identified  Htt-interacting  proteins  also  revealed  disordered  protein  domains  are 
common  to  the  vast  majority  of  Htt-interacting  proteins.  Deletion  of  the  disordered 
domain  of  selected  proteins  eliminates  Htt-polyQ  aggregate  interaction,  suggesting  that 
this  domain  may  be  necessary  for  cellular  protein  interaction  with  mutant  huntingtin. 
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Table  1:  Neurodegenerative  disease-associated  amyloid-forming  proteins. 


Neurodegenerative  Disease 

Amyloid-forming  protein 

Cells/Regions  Affected 

Alzheimer’s  Disease 

Beta- amyloid 

Cortex,  hippocampus, 
basal  forebrain,  brain 
stem 

Parkinson’s  disease 

Alpha-synuclein 

Substantia  nigra,  cortex, 
locus  ceruleus,  raphe,  etc. 

Transmissible  spongiform 
encephalopathy  (e.g.  Bovine 
spongiform  encephalopathy) 

PrPsc 

Cortex,  thalamus,  brain 
stem,  cerebellum,  other 

areas 

Fatal  Familial  Insomnia 

PrPsc 

Cortex,  thalamus,  brain 
stem,  cerebellum,  other 

areas 

Huntington  Disease 

Huntingtin 

Striatum,  basal  ganglia, 
cortex,  and  other  regions 

Dentato-rubral  and  Pallido- 
Luysian  atrophy  (DRPLA) 

Atrophin- 1 

Basal  ganglia,  brain  stem, 
cerebellum,  spinal  cord 

SCA1-7 

Ataxin  1-7 

Basal  ganglia,  brain  stem, 
cerebellum,  spinal  cord 

Spinal  Bulbar  Muscular 
Atrophy  (SBMA) 

Androgen  Receptor 

Basal  ganglia,  brain  stem, 
cerebellum,  spinal  cord 

Fronto-temporal  dementia 

Tau 

Frontal  and  temporal 
cortex,  hippocampus 

Kuril 

PrPsc 

Cortex,  thalamus,  brain 
stem,  cerebellum,  other 

areas 

Gerstmann-Straussler- 
Scheinker  (CSS) 

PrPsc 

Cortex,  thalamus,  brain 
stem,  cerebellum,  other 

areas 

Type  II  Diabetes 

Islet  amyloid  polypeptide 
(IAPP) 

Pancreatic  cells 

Familial  amyloidotic 
cardiomyopathy  (FAC) 

Transthyretin 

Cardiomyocytes 

Sickle-cell  disease 

Hemoglobulin  (Hb) 

Erythrocytes 

Dialysis-related  amyloidosis 

Beta- 2  microglobulin 
(B2M) 

Extracellular  affecting 
joints 
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Chapter  2:  Proteins  with  intrinsically  disordered  domains  are 
preferentially  recruited  to  polyglutamine  aggregates 


Introduction 

Accumulation  of  intracellular  and  extracellular  protein  aggregates  is  a  common 
feature  of  multiple  age-associated  human  disorders,  particularly  neurodegenerative 
diseases  (167;  203).  Amyloid  is  a  highly-ordered  aggregate  that  consists  of  polypeptides 
arranged  in  filamentous,  beta  sheet-rich  structures  with  abundant  interlocking  hydrogen 
bonds  between  sheets  (8;  130;  184;  209).  Multiple  proteins  can  adopt  this  architecture 
(220),  and  once  established,  they  all  share  extraordinary  resistance  to  proteolysis, 
chaotropic  agents,  detergents,  and  mechanical  breakage  (49;  109;  112;  126;  182). 
Amyloid  aggregates  -  or  their  oligomeric  precursors  -  are  believed  to  contribute  to 
cellular  toxicity  through  a  variety  of  mechanisms,  such  as  physically  disrupting 
membranes  or  sequestering  essential  heterologous  proteins  (16;  79). 

Triplet  CAG  expansions  resulting  in  polyglutamine  (polyQ)  tracts  are 
characteristic  of  at  least  nine  neurodegenerative  diseases  (143;  152),  of  which  the  most 
common  is  Huntington  disease  (HD).  Proteins  with  expanded  tracts  of  polyQ,  which  are 
natively  unstructured,  are  predisposed  to  adopt  conformations  with  amyloid-like 
properties  (23;  176).  When  polyQ  tracts  of  the  Huntingtin  protein  (Htt)  exceed  ~35 
glutamines,  Htt  fragments  can  form  intracellular  inclusions  (39;  69;  176;  186)  resulting  in 
impaired  Htt  function  and  aberrant  protein  interactions  (40).  These  inclusions  (or  their 
early  precursors)  may  confer  a  dominant  gain-of-function  cellular  toxicity  (86;  170). 
Other  studies  suggest  that  aggregation  caused  by  polyQ  expansion  can  result  in  loss  of 
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protein  function  (and  thus  pathology),  either  directly  via  functional  loss  of  the 
aggregating  species  (78),  or  from  the  sequestration  of  proteins  that  tightly  interact  with 
the  aggregate  (155;  243).  Thus,  the  proteins  that  interact  with  aggregates  may  mediate 
pathological  mechanisms,  either  by  conferring  new  cytotoxicity  or  by  contributing  to  a 
general  loss  of  function.  For  these  reasons,  determining  how  and  why  specific  proteins 
interact  with  polyQ  aggregates  is  important  to  understanding  -  and  ultimately  combating 
-  disease  mechanisms. 

Recently,  we  established  a  mass  spectrometry-based  approach  for  non-targeted 
identification  of  intracellular  amyloid-forming  and  amyloid-associated  proteins  (109; 

1 10).  Our  method,  called  TAPI  (Technique  for  Amyloid  Purification  and  Identification), 
exploits  the  biophysical  characteristics  of  amyloid,  namely  detergent  resistance  and  high 
molecular  weight,  to  isolate  amyloid  aggregates  from  cell  lysates.  Stringent  nuclease 
treatment  followed  by  SDS-gel  electrophoresis  eliminates  non-specific  or  loosely 
associated  proteins.  Thus  the  TAPI  protocol  differs  from  antibody-based  pull-down 
protocols  by  limiting  positive  hits  to  those  most  tightly  associated  with  amyloid-like 
aggregates. 

In  this  study  we  applied  the  TAPI  protocol  coupled  with  mass  spectrometry  to 
aggregates  formed  by  polyQ-expanded  huntingtin  fragments  in  both  yeast  ( S .  cerevisiae) 
and  mammalian  cells  (PC- 12,  rat  neuronal  precursor).  Previously,  various  proteins  have 
been  shown  to  interact  with  Htt  or  Htt  fragments  (75;  97;  118;  155;  208;  223),  but  our 
approach  was  designed  to  identify  proteins  that  are  directly  trapped  within  the  amyloid¬ 
like  polyQ  aggregates  to  determine  the  types  of  proteins  most  prone  to  irreversible 
inclusion.  We  hypothesized  that  these  aggregates  would  recruit  and/or  sequester  proteins 
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with  common  biophysical  properties.  We  observed  that  inclusion  into  polyQ  aggregates 
was  mediated  by  long  intrinsically-disordered  (ID)  protein  domains  (>  100  amino  acids) 
in  two  evolutionary  divergent  cell  models.  Also,  many  proteins  normally  associated  with 
neuronal  aggregation  in  other  degenerative  diseases  (especially  amyotrophic  lateral 
sclerosis  (ALS))  were  disproportionately  recruited  into  polyQ  aggregates  in  mammalian 
cells.  This  study  expands  the  emerging  connection  between  ID  domains  and 
neurodegenerative  disease  (213),  and  demonstrates  that  long  ID  domains  predispose 
proteins  to  be  recruited  into  amyloid-like  aggregates. 

Results 

Isolation  &  Analysis  of  Polyglutamine  Aggregates  in  Yeast 

To  identify  aggregate-associated  proteins,  human  HttQ103-GFP  and  HttQ25-GFP  (134) 
were  expressed  under  control  of  a  galactose-inducible  (GAL1)  promoter  in  the  yeast 
Saccharomyces  cerevisiae  (Figure  2).  Both  expression  constructs  contain  the  human 
huntingtin  (Htt)  exon  1  fragment  with  polyQ  tracts  (103  or  25  glutamines,  respectively) 
fused  in  frame  with  green  fluorescent  protein  (GFP).  As  observed  previously,  HttQ25- 
GFP  is  soluble  during  expression,  whereas  HttQ103-GFP  forms  toxic  cytoplasmic  SDS- 
resistant  aggregates  (134)  (Figure  2A)  that  have  amyloid-like  tinctorial  properties  and  can 
be  trapped  at  the  top  of  an  SDS  acrylamide  gel  (109)  (Figure  2B;  Figure  3B).  Proteins 
that  are  specifically  associated  with  Htt  amyloid-like  aggregates  were  isolated  using  the 
TAPI  method  (109;  110),  which  traps  the  large  detergent-resistant  species  in  acrylamide 
gel  matrix  for  subsequent  extraction  and  identification.  As  demonstrated  in  Figure  2B,  the 
TAPI  method  isolates  amyloid-like  aggregates  of  HttQ103-GFP,  whereas  the  non- 
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Figure  2:  Polyglutamine-expanded  Huntingtin  exon  1  forms  aggregates  in  yeast  that 
can  be  isolated  by  TAPI. 

(A)  Fluorescence  microscopy  of  yeast  expressing  GFP-tagged  Huntingtin  exon  1 
(Htt)  with  a  normal  (Q25)  or  an  expanded  polyglutamine  tract  (Q103).  (B)  Western 
blot  of  GFP-Htt-Q25  and  GFP-Htt-Q103  showing  that  high  molecular  weight 
aggregates  can  be  isolated  from  Htt-Q103  expressing  yeast  cells.  Lysate  =  input; 
TAPI  =  purified  aggregate. 


15 


aggregate-forming  HttQ25-GFP  does  not  form  species  large  enough,  or  sufficiently 
detergent-resistant,  for  isolation. 

Proteins  tightly  associated  with  the  isolated  Htt-polyQ  aggregates  were  identified 
using  tandem  mass  spectrometry  (MS/MS).  Qualitative  comparison  of  all  identified 
proteins  from  the  HttQ103-GFP  samples  -  relative  to  the  HttQ25-GFP  samples  -  reveals 
a  subset  that  is  only  associated  with  the  large  polyQ  aggregates  (Table  1;  Appendix  1).  In 
total,  52  proteins  were  considered  polyQ-associated  because  they  were  reproducibly 
found  in  the  expanded  HttQ103  aggregate  (Table  1)  while  absent  in  the  HttQ25  control 
sample  (described  in  methods).  To  confirm  that  our  approach  is  not  enriching  for 
abundant,  large,  or  charged  proteins,  the  polyQ  aggregate-associated  proteins  were 
compared  against  the  entire  yeast  proteome.  No  obvious  differences  in  size  distribution, 
abundance  ((122);  Figure  4.),  or  charge  (the  avg.  pi  of  TAPI-identified  proteins  is  7.1,  the 
same  as  the  approximation  for  the  yeast  proteome  (180))  were  observed  between  our 
TAPI-identified  proteins  and  that  of  the  entire  yeast  proteome. 

Molecular  functions  of  HttQ103-GFP  aggregate-associated  proteins  in  S.  cerevisiae 

HttQ103-GFP  aggregate-associated  proteins  were  examined  using  gene  ontology 
(GO)  and  Saccharomyces  genome  database  (SGD)  to  assign  their  functions  and 
properties  (5;  6;  28;  117).  Unexpectedly,  RNA/DNA-binding  (mostly  RNA  binding; 
Table  1)  proteins  make  up  the  largest  percentage  of  HttQ103-GFP  aggregate-associated 
proteins.  In  fact,  more  than  l/3rd  of  the  polyQ-associated  proteins  are  specifically 
characterized  as  RNA-binding  proteins  (RBPs).  Previous  studies  suggest  that 
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Figure  3:  RNA  content  of  TAPI  samples  and  Confirmation  of  aggregation. 

(A)  Efficacy  of  RNase  treatment  in  TAPI  procedure.  Yeast  cell  lysates  from  Htt- 
Q103-GFP  or  Htt-Q25-GFP  expressing  cells  were  put  through  the  TAPI  procedure 
(see  methods  and  materials  Chapter  4)  with  (lanes  5  and  6)or  without  (lanes  3  and 
4)  the  addition  of  3  mg  RNase  A  enzyme.  Samples  were  run  with  purified  RNA 
standards  (Invitrogen,  lanes  1  and  2)  at  indicated  concentrations  on  a  0.8%  agarose 
gel  in  RNA-free  conditions  and  (B)  Th-T  fluorescence  of  crude  aggregates  isolated 
from  yeast  expressing  Htt-Q103-GFP  or  HttQ25-GFP.  The  average  of  fluorescence 
emission  at  490nm  over  three  experiments  were  compared  to  control  purified  prion 
forming  domain  (NM)  of  yeast  prion  protein  Sup35  using  a  two-tailed  t-test  to 
determine  statistical  significance.  Error  bars  were  created  using  standard  deviation. 
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Protein  size  versus  Abundance 


Figure  4:  Comparison  of  protein  size  and  abundance. 

The  yeast  proteome  is  compared  with  the  52  proteins  identified  by  TAPI  as  being 
tightly  associated  with  Htt-Q103-GFP.  The  proteins  found  to  co-aggregate  with 
polyQ  were  not  obviously  larger  or  more  abundant  than  the  yeast  proteome  in 
general  (TAPI  avg.  1.2x104  copies/cell,  Proteome  avg.  1.3x104  copies/cell).  TAPI 
does  not  enrich  for  discproportionately  large  proteins  (Avg  TAPI  protein  size72kD  t 
53kD  )  P  72kDa,  proteome  avg  53kDa).  Protein  abundance  values  accumulated  from 
western  blotting  (64),  GFP  expression  (144),  2D  gel  analysis  (62),  and  APEX  analysis 
(122). 
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RBPs  may  localize  to  aggregates  because  RNA  co-aggregates  with  amyloid¬ 
forming  proteins  (158).  However  the  TAPI  method  involves  extensive  RNase  treatment 
(109;  110);  while  we  cannot  conclude  that  RNA  is  completely  absent,  the  vast  majority  of 
RNA  is  eliminated  prior  to  aggregate  isolation  (Figure  3A).  As  most  RNA-dependent 
interactions  should  be  lost,  a  preponderance  of  RBPs  in  the  HttQ103-GFP  aggregate  is 
likely  independent  of  RNA-mediated  interactions. 

Yeast  proteins  recruited  into  polyQ  aggregates  share  common  biophysical 
properties 

PolyQ  aggregates  have  been  proposed  to  induce  heterologous  protein  misfolding 
via  a  “cross-seeding”  mechanism  (60).  Previously,  Michelitsch  and  Weissman  observed 
that  Q/N-rich  domains  have  an  increased  propensity  to  adapt  amyloid  structure  and 
concluded  that  30  glutamine  and/or  asparagine  residues  within  an  80-amino  acid  stretch 
served  as  a  good  predictor  of  amyloid  formation  (137).  Assuming  that  the  polyQ 
aggregates  could  “cross-seed”  such  Q/N-rich  proteins,  we  analyzed  the  52  identified 
proteins  and  found  they  are  significantly  more  likely  to  have  Q/N-rich  domains  than  the 
whole  of  the  yeast  proteome  (54%  vs.  2%(  137),  respectively;  Appendix  1).  For 
comparison,  if  simply  meas  uring  for  total  glutamine  content  of  the  identified  proteins, 
the  HttQ103-GFP  aggregate-associated  proteins  had  only  a  two-fold  greater  total 
percentage  of  glutamine  content  relative  to  the  yeast  proteome  (respectively,  9.7%  vs. 

3. 8%(99);  Appendix  1). 

An  enrichment  of  Q/N-rich  segments  also  implies  an  increase  in  intrinsically 
unstructured  protein  domains.  When  each  of  the  polyQ  aggregate-associated  proteins  was 
examined  for  global  intrinsic  disorder  (described  here:  (230)),  the  aggregate-associated 
proteins  showed  a  higher  average  total  percent  intrinsic  disorder  relative  to  the  average 
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value  for  the  whole  yeast  proteome  (48%  vs.  20%,  respectively).  However,  if  the 
identified  proteins  are  specifically  analyzed  for  containing  discrete  regions  that  are 
intrinsically  disordered  (using  IUPred-L  prediction  algorithm  described  in  the  methods 
(50)),  the  polyQ-associated  proteins  are  strongly  enriched  for  the  presence  of  an  ID 
domain  (Figure  5).  Previous  studies  have  classified  ID  domains  as  unstructured  regions 
greater  than  20-40  amino  acids  long  (205;  221).  For  the  yeast  proteins  associated  with 
polyQ  aggregates,  almost  all  have  an  ID  domain  of  at  least  >30  amino  acids  in  length 
(92%  vs.  31%  of  the  proteome  control,  respectively;  Appendix  1).  However,  ~2/3rds  of 
the  proteins  contain  very  long  ID  domains  of  >100  amino  acids  (63%  versus  9%;  Figure 
5),  and  strikingly  of  these  proteins,  ~l/3ld  contain  no  Q/N-rich  domain. 

ID  domains  facilitate  protein  interactions  with  RNA,  so  the  large  cohort  of  RBPs 
we  found  associated  with  polyQ  aggregates  could  simply  be  a  result  of  these  proteins 
disproportionately  possessing  ID  domains.  The  RBP  subset  of  aggregate-associated  yeast 
proteins  was  compared  to  all  putative  and  characterized  RBPs  in  the  yeast  proteome  for 
ID  content  (Appendix  1).  A  majority  (70%)  of  the  HttQ103-GFP  aggregate-associated 
RBPs  contain  ID  domains  of  >100  amino  acids,  while  RBPs  in  general  rarely  have  such 
long  ID  domains  (~17%;  Appendix  1).  Thus,  the  high  frequency  of  RBPs  in  the  polyQ 
aggregates  might  result  from  the  presence  of  long  ID  domains  in  these  proteins,  rather 
than  a  result  of  some  uncharacterized  RNA-binding  mechanism  of  polyQ  aggregates. 
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Figure  5:  Cellular  proteins  trapped  with  Htt  polyQ  aggregates  are 
disproportionately  composed  of  long  intrinsically-disordered  (ID)  domains. 

(A)  Comparisons  of  the  percentages  of  proteins  with  long  ID  domains  of  the  52  yeast 
proteins  in  Table  2  (reproducibly  found  by  TAPI  to  be  tightly  associated  with  Htt- 
Q103-GFP  aggregates)  versus  100  randomly-selected  yeast  proteins  (Appendix  1). 
Most  of  the  identified  proteins  have  long  ID  domains  of  at  least  100  amino  acids.  (B) 
Comparisons  of  the  percentages  of  proteins  with  long  ID  domains  of  the  91  rat 
proteins  in  Table  3  (reproducibly  found  by  TAPI  to  be  tightly  associated  with  GFP- 
Htt-Q74  aggregates)  versus  200  randomly-selected  rat  proteins  (Appendix  2).  ID 
domains  are  defined  as  regions  of  30  or  more  amino  acids  with  IUPred  scores  of  0.5 
or  greater  (50;  51).  Chi-Square  Fisher’s  Exact  test  (Graphpad  software)  was  used  to 
determine  significance  between  TAPI-identified  proteins  and  proteome  control  sets. 
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Biochemical  confirmation  of  protein  recruitment  to  polyQ  aggregates  in  yeast 

Deflp,  Ent2p,  Sgt2p  and  Bmhlp  are  among  the  proteins  we  found  to  be  specific 
to  polyQ  aggregates  in  yeast.  We  also  found  that  mammalian  homologs  of  Ent2p,  Sgt2p 
and  Bmhlp  (CLINT  1,  SGTA  and  YWHAB,  respectively)  co-aggregate  with  polyQ  in 
mammalian  cells  (discussed  in  detail  below).  Ent2p,  Sgt2p  and  Bmhlp  were  previously 
shown  to  have  effects  on  protein  aggregation  (or  toxicity)  in  yeast  models  (52;  103;  229), 
while  Deflp  has  not  been  shown  to  influence  protein  aggregation.  We  selected  Deflp, 
Ent2p,  Sgt2p  and  Bmhlp  for  biochemical  confirmation  of  the  MS  results. 

The  presence  of  Deflp,  Ent2p,  Sgt2p  and  Bmhlp  in  polyQ  aggregates  was  tested 
by  immunoblotting  following  a  modified  version  of  our  TAPI  protocol  (Figure  6). 
HttQ103-GFP,  expressed  in  yeast,  forms  a  high  molecular  weight  aggregate  that 
partitions  to  the  pellet  fraction  and  gets  stuck  at  the  top  of  an  SDS-PAGE  gel  (Figure 
6A).  When  HA-tagged  Deflp,  Ent2p,  Sgt2p  and  Bmhlp  are  co-expressed  with  HttQ103- 
GFP,  they  show  a  similar  pattern  in  their  respective  western  blots,  but  only  when 
expressed  with  the  long  polyQ  expansion,  not  with  HttQ25-GFP  (Figure  6B).  Thus,  the 
interactions  of  all  three  proteins  with  polyQ  aggregates  are  sufficiently  strong  that  they 
co-fractionate  and  withstand  the  conditions  of  SDS-PAGE,  resulting  in  their  retention  in 
the  large  resistant  species  that  cannot  migrate  into  the  gel  (Figure  6B).  The  His3  protein 
was  chosen  as  a  negative  control  as  it  was  never  identified  in  our  TAPI  samples. 
Immunoblotting  confirms  that  unlike  Deflp,  Sgt2p  and  Bmhlp,  HA-tagged  His3p  is  not 
entangled  within  polyQ  aggregates  (Figure  6B),  thus  recapitulating  our  MS  results 
biochemically.  To  ensure  that  proteins  were  not  independently  forming  large  detergent- 
resistant  aggregates  as  a  consequence  of  cellular  stress  caused  by  HttQ103-GFP,  cells 
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Figure  6:  Western  blotting  confirms  that  T API-identified  proteins  are  trapped  in 
large,  detergent-resistant  Htt-polyQ  aggregates. 

(A)  Immunoblotting  reveals  that  Htt-Q103-GFP,  but  not  Htt-Q25-GFP,  forms  large 
detergent-resistant  aggregates  that  fractionate  in  the  pellet  (partial  TAPI 
purification;  see  methods)  and  remain  at  the  top  of  an  acrylamide  gel  under  SDS- 
PAGE  conditions.  (B)  Immunoblotting  confirms  that  HA-tagged  Bmhlp,  Deflp, 
Ent2p,  and  Sgt2p  (proteins  identified  by  mass  spec)  get  trapped  in  the  detergent- 
resistant  aggregates  that  can  be  seen  stuck  at  the  top  of  the  gels  in  the  pellet 
fractions.  As  a  negative  control,  HA-tagged  His3p  (not  identified  by  mass  spec) 
shows  no  susceptibility  to  co-aggregation  with  Htt-Q103-GFP.  Note  that  Deflp  was 
not  easily  visualized  in  the  supernatant  fraction  because  it  is  prone  to  degradation 
(data  not  shown).  Samples  were  spun  at  45,000  rpm,  except  Ent2p  (10,000  rpm).  S 
=  supernatant;  P  =  pellet  fraction. 
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were  treated  with  two  alternative  stresses:  proteasome  inhibitor  MG  132  and  over¬ 
expressed  human  a-synuclein  protein,  which  is  toxic  in  yeast  cells(  153).  Neither 
condition  resulted  in  Sgt2p  getting  stuck  at  the  top  of  an  acrylamide  gel  (Figure  7.) 

Analysis  of  Htt  polyglutamine  aggregate-associated  proteins  in  PC-12  cells 

With  the  observation  that  RBPs  and  proteins  with  ID  or  Q/N-rich  domains  are  bound  to 
HttQ103-GFP  aggregates  in  the  yeast  model,  we  asked  if  the  same  would  be  true  in  a 
mammalian  system.  To  test  this,  another  Huntington  disease  model  was  used: 
mammalian  PC- 12  cells  stably  expressing  doxycycline-inducible  HttQ23-GFP  or 
HttQ74-GFP  developed  by  David  Rubinsztein’s  lab  (239).  Again,  both  constructs  contain 
the  Htt  exon  1  fragment  with  a  polyQ  tract  (23  or  74)  fused  in  frame  with  GFP,  and  only 
the  protein  with  pathogenic  extended  polyQ  fonns  SDS-resistant  aggregates  (Figure  8 A). 
In  this  model,  the  polyQ  aggregates  have  an  approximately  equal  distribution  in  the 
cytoplasm  and  nucleus(239),  thus  may  interact  with  most  of  the  non-secreted  cellular 
proteome.  The  amyloid-forming  HttQ74-GFP,  but  not  HttQ23-GFP,  could  be 
successfully  purified  and  detected  using  the  TAPI  method,  as  shown  in  Figure  8B.  MS 
analysis  followed  by  comparison  of  all  identified  proteins  showed  a  subset  that  was 
unique  to  samples  with  polyQ  aggregates.  Using  the  criteria  described  above,  91  proteins 
were  considered  specific  to  the  HttQ74-GFP  aggregates  (Table  3;  Appendix  2). 

Molecular  functions  of  HttQ74  aggregate-associated  proteins  from  PC-12  cells 

Characterization  of  the  proteins  enriched  in  polyQ  aggregates  from  PC- 12  cells  reveals  a 
disproportionate  number  of  RBPs,  as  similarly  observed  in  yeast  (Table  3;  Appendix  1  & 
2).  Also,  several  functional  homologs  were  common  to  the  polyQ  aggregates  isolated 
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Figure  7:  Proteasomal  inhibition  does  not  affect  proteins  trapped  in  Htt-polyQ 
aggregates. 

Proteasomal  inhibition  or  proteo-toxic  stress  are  not  sufficient  to  cause  Sgt2p  to  be 
trapped  in  (or  form)  detergent-resistant  high-molecular  weight  aggregates.  Neither 
the  overnight  exposure  (~16  hours)  of  yeast  cells  to  10  pM  MG-132  nor  the 
overnight  expression  of  human  alpha-synuclein  altered  the  electrophoretic 
migration  of  yeast  Sgt2p.  However,  the  accumulation  of  Q103-GFP  into  a  large 
detergent-resistant  species  was  sufficient  to  affect  Sgt2p’s  migration  into  an 
acryalmide  gel. 
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Figure  8:  Polyglutamine-expanded  Huntingtin  exon  1  forms  aggregates  in  PC-12 
cells  that  can  be  isolated  by  TAPI. 

(A)  Fluorescence  microscopy  of  PC-12  cells  expressing  doxycycline  inducible 
transgene  GFP-tagged  Huntingtin  exon  1  (Htt)  with  normal  (Q23)  or  expanded 
polygluamine  tract  (Q74).  (B)  Western  blot  of  GFP-Htt-Q23  and  GFP-Htt-Q74 
showing  high  molecular  weight  aggregates  can  be  isolated  from  Htt-Q74  expressing 
PC- 12  cells.  Lysate  =  input;  TAPI  =  purified  aggregates. 


from  both  the  yeast  and  mammalian  cells  (Table  5):  RNA-binding  proteins  DDX5  (yeast 
Dhhlp)  and  hnRNPA3  (yeast  Hrplp),  14-3-3  proteins  YWHAB  and  SFN  (yeast  Bmhlp), 
endocytosis  proteins  CLINT1  (yeast  Entl/2p)  and  AAK1  (yeast  Akllp),  and  chaperone 
proteins  SGTA  (yeast  GET  pathway  protein  Sgt2p),  DNAJA2  and  DNAJA4  (yeast  Ydjlp 
and  Apj  ip). 

Biophysical  properties  of  proteins  that  associate  with  polyQ  aggregates  in  PC-12 
cells 

Biophysical  characterization  reveals  the  HttQ74-GFP  aggregate-associated 
proteins  from  PC- 12  cells  are  enriched  for  Q/N-rich  regions  relative  to  the  entire  rat 
proteome  (7%  versus  0.4%),  albeit  to  a  much  lesser  degree  than  in  yeast.  Proteins 
containing  long  ID  domains  (>100  amino  acids)  are  significantly  increased  among  the 
T API-identified  HttQ74-GFP  aggregate-associated  proteins  (31%  vs.  18%  for  proteome 
control,  respectively,  p=0.021;  Figure  5;  Appendix  2).  As  in  yeast,  this  suggests  that 
cellular  proteins  with  long  ID  domains  may  be  inherently  prone  to  inclusion  in  polyQ 
aggregates. 

Neurodegenerative  disease-linked  proteins  are  recruited  into  polyQ  aggregates 

Among  the  aggregate-specific  proteins  in  PC- 12  cells,  we  also  identified  a 
significant  subset  of  proteins  that  are  neurodegenerative  disease-associated  (Table  4). 
Surprisingly,  these  proteins  were  not  limited  to  huntingtin-interacting  proteins;  we 
identified  a  cadre  of  ALS-linked  proteins.  We  hypothesized  polyQ  aggregates  could  pull 
in  proteins  that  are  prone  to  aggregation  in  other  pathological  contexts.  When  we  probed 
the  purified  polyQ  aggregates  (purified  fraction  confirmed  in  Figure  9 A)  with  antibodies 
specific  to  proteins  that  are  known  to  aggregate  in  the  motor  neurons  of  ALS  patients 
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(and  were  identified  by  MS  in  this  study),  we  corroborated  the  specific  presence  of  FUS, 
TDP-43,  and  UBQLN2  in  the  Htt-Q74  aggregates  (Figure  9B).  We  hypothesized  that 
other  ALS-linked  proteins  may  be  similarly  recruited  into  aggregates  but  escaped 
detection  by  MS.  Immunoprobing  for  the  ALS-linked  HNRPA1  also  revealed  its 
presence  in  the  purified  polyQ  aggregates  (Figure  9B).  For  a  control,  we  probed  for  the 
kinase  Erk,  which  was  not  identified  by  MS  in  our  samples,  and  indeed,  it  could  not  be 
found  in  the  polyQ  aggregates  (Figure  9A).  In  total,  of  the  HttQ74-GFP  aggregate- 
associated  proteins  in  PC- 12  cells,  21%  (19/91)  have  previously  been  found  in  the 
intraneuronal  inclusions  of  various  neurodegenerative  diseases  (Table  4).  Of  these 
disease-linked  HttQ74-GFP  aggregate-associated  proteins,  many  are  RBPs  (7/19)  and 
more  than  half  (9/19)  contain  very  long  ID  domains  (>100  amino  acids)  (Table  4; 
Appendix  2). 

Since  one  fifth  of  the  proteins  we  identified  have  been  previously  linked  to  pathological 
aggregates,  perhaps  many  of  the  remaining  proteins  represent  novel  aggregate-associated 
proteins  that  have  never  been  specifically  probed  in  various  pathological  contexts.  We 
selected  HSPA8,  CLINT  1  and  SGTA  as  candidate  proteins  that  could  potentially  be 
recruited  into  pathological  aggregates  in  neurodegenerative  disease.  We  chose  CLINT  1 
and  SGTA  because  intriguingly  their  homologs  (Sgt2p  and  Ent2p,  respectively)  were  also 
identified  in  our  yeast  model.  The  Hsp70  protein  HSPA8  was  selected  because  Hsp70s 
have  been  previously  suspected  to  play  critical  roles  in  neurodegenerative  disease  (14; 
133;  138).  We  confirmed  by  immuno-blotting  the  presence  of  all  three  proteins  in  the 
highly-purified  Htt-Q74-GFP  aggregates  from  PC- 12  cells  (Figure  9B). 
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Figure  9:  Confirmation  of  polyQ-associated  proteins  from  PC-12  cells  identified  by 
TAPI. 


(A)  Western  blotting  shows  that  the  addition  of  doxycycline  to  the  PC-12  cell  model 
induces  the  expression  of  HttQ74-GFP,  resulting  in  aggregates  that  can  be  purified 
by  TAPI.  The  kinase  ERK  is  probed  as  a  negative  control;  ERK  was  never  identified 
by  mass  spectrometry,  so  is  not  expected  to  co-fractionate  with  polyQ  aggregates. 

(B)  Western  blot  analysis  of  TAPI-purified  polyQ  aggregates  from  PC-12  cells 
confirms  the  presence  of  several  disease-associated  proteins  only  in  the  Htt-Q74 
samples.  All  proteins  migrated  near  their  predicted  molecular  weights.  For  control, 
the  TAPI  procedure  was  conducted  in  parallel  on  the  induced  Htt-Q23  cell  line  (FUS, 
TDP-43,  UBQLN2,  HNRNPA1)  or  the  un-induced  Htt-Q74  cell  line  (CLINT1,  HSPA8, 
RAD23B,  SGTA).  (C)  Confocal  microscopy  shows  localization  of  identified  proteins  to 
Htt-Q74  aggregates  in  PC12  cells,  (left)  RAD23B,  nominally  a  DNA  repair  protein, 
localizes  to  nuclear  Htt-Q74  inclusions  but  not  cytoplasmic  inclusions,  (middle)  FUS, 
an  RNA-binding  protein  localizes  to  nuclear  and  cytoplasmic  Htt-Q74  inclusions, 
(right)  CLINT1,  a  clatherin-interacting  protein,  is  observed  in  cytoplasmic  Htt-Q74 
aggregates.  Arrows  indicate  foci  with  co-localized  proteins.  Green  =  GFP;  Magenta  = 
CLINT1,  FUS  or  RAD23B  in  merge. 
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Confirmation  of  recruitment  of  identified  proteins  to  polyQ  aggregates  by 
immunocytochemistry 

The  localization  of  selected  polyQ  aggregate-associated  proteins  was  also 
confirmed  by  confocal  microscopy  (Figure  9C).  The  proteins  CLINT  1,  RAD23B  and 
FUS  were  selected  because  they  appeared  to  be  particularly  strong  hits  based  on  the 
initial  analysis  of  the  polyQ  mass  spectrometry  data  and  Western  blot  results.  Using 
immunocytochemistry,  each  was  observed  to  be  aberrantly  recruited  to  the  major  sites  of 
HttQ74-GFP  aggregation  (Figure  9C).  However,  despite  strong  over-expression  of 
HttQ74-GFP  and  the  formation  of  large  intracellular  aggregates,  the  recruited  proteins  did 
not  appear  to  completely  localize  to  the  aggregates.  In  fact,  only  a  fraction  of  each 
protein’s  respective  total  was  found  at  the  aggregate. 

Intrinsically-disordered  domains  play  a  role  in  the  recruitment  of  proteins  to  polyQ 
aggregates 

We  selected  yeast  Sgt2p  and  human  FUS  to  further  examine  the  role  of  ID 
domains  in  localization  to  Htt-polyQ  aggregates.  Sgt2p  is  involved  in  protein  quality 
control  and  does  not  contain  a  known  RNA-binding  domain  (RBD).  Interestingly,  we 
also  identified  Sgt2p’s  mammalian  homolog,  SGTA,  among  the  proteins  associated  with 
polyQ  aggregates  in  PC-12  cells.  The  FUS  protein  contains  a  distinct  RBD  and  a  long  N- 
tenninal  ID  domain,  and  when  expressed  in  yeast,  exhibits  aggregation  and  toxicity 
reminiscent  of  what  is  observed  in  diseased  motor  neurons  (61;  95;  111;  196). 

To  detennine  the  contribution  of  their  respective  ID  domains  toward  recruitment 
to  polyQ  aggregates,  we  created  expression  vectors  in  which  the  major  ID  domain  (as 
detennined  by  IUPred-L)  of  both  FUS  and  Sgt2p  is  deleted  (FUSAID  =  FUSA1"134; 
Sgt2AID  =  Sgt2A300'346).  The  full-length  versions  of  FUS  and  Sgt2p,  or  their  AID 
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counterparts,  were  co-expressed  with  either  HttQ25-GFP  or  HttQ103-GFP.  We  truncated 
the  TAPI  protocol  to  easily  evaluate  the  co-localization  of  the  proteins  with  polyQ 
aggregates  (lysate  partitioning;  see  methods).  When  we  isolated  the  Htt-polyQ  aggregates 
by  lysate  partitioning  from  the  Sgt2-transformed  strains  (shown  in  Figure  10A  as  the 
resistant  species  stuck  at  the  top  of  the  gel),  we  observed  an  enrichment  of  full-length 
Sgt2p  in  the  Htt-polyQ  high  molecular  weight  aggregates  (Figure  10B,  left  panel). 
However,  when  the  major  ID  domain  was  deleted,  most  of  the  co-localization  with  the 
polyQ  aggregate  is  eliminated  (Figure  10B,  right  panel).  The  exact  same  pattern  was 
observed  for  FUS  and  FUSAID  (Figures  IOC  and  10D,  respectively).  For  comparison,  we 
also  used  an  engineered  variant  of  FUS,  which  has  leucines  in  place  of  four  conserved 
phenylalanines  (FUS(4F-L):  amino  acids  305,  341,  359  and  368)  in  the  RNA  recognition 
motif  (RRM)  domain.  FUS(4F-L)  was  previously  shown  to  be  RNA-binding  incompetent 
(38).  We  observed  that  FUS(4F-L)  was  recruited  to  polyQ  aggregates  as  readily  as  wild- 
type  FUS  (Figure  10E,  F),  thus  suggesting  that  RNA  binding  may  not  play  a  significant 
role  in  facilitating  a  protein’s  inclusion  into  polyQ  aggregates. 

Discussion 

Compositional  Analysis  of  Poly  glutamine  Aggregates  using  TAPI 

Models  of  huntingtin  exon  1  mimic  truncated  versions  of  huntingtin  found  in 
intraneuronal  aggregates  (131),  and  thus  are  helpful  for  studying  intracellular 
aggregation.  Here  we  analyze  the  protein  species  that  get  recruited  into  amyloid-like 
aggregates  formed  by  polyQ-expanded  Huntingtin  exon  1  in  both  yeast  (Q103)  and 
mammalian  cells  (Q74).  The  distinctive  tinctorial  properties  and  detergent  resistance  of 
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Figure  10:  The  ID  domains  of  Sgt2p  and  Fus  mediate  their  localization  to  Htt-polyQ 
aggregates  in  yeast  cells. 

(A,  B)  Western  blots  of  lysates  from  yeast  strain  W303  expressing  HttQ25-GFP  or 
HttQ103-GFP  in  combination  with  HA-tagged  Sgt2p  or  Sgt2AID  (A  =  aGFP;  B  =  aHA). 
(C,  D)  Western  blots  of  cells  expressing  HttQ25-GFP  or  HttQ103-GFP  in  combination 
with  FUS  or  FUSAID  (C  =  aGFP;  D  =  FUS  &  a-(B  actin).  Because  FUS  is  quickly 
degraded  in  non-denaturing  conditions,  input  controls  using  urea  lysis  of  cells  were 
included  to  show  initial  protein  loads.  (E,  F)  Western  blots  of  FUS  or  FUS(4FL)  in 
HttQ25-GFP-expressing  or  HttQ103-GFP-expressing  cells. 
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the  aggregates  indicates  that  they  are  in  an  amyloid-like  state  (Figure  3B). 

Proteins  that  specifically  associate  with  polyQ  aggregates  are  hypothesized  to 
play  either  positive  or  negative  roles  in  pathogenic  processes.  Previous  attempts  to 
identify  the  proteins  that  interact  with  huntingtin,  have  employed  yeast-two-hybrid, 
immuno-precipitation  and  immuno-histochemical  screening  (47;  75;  80;  97;  155;  160; 
171;  208;  228).  The  overlap  in  identified  proteins  from  different  methods  is  often  low  or 
the  identification  of  hundreds  of  proteins  limits  clarity.  Also,  antibody-based  screening  is 
limited  to  a  predetennined  set  of  functioning  antibodies  and  thus  may  overlook 
unexpected  interactions.  Our  mass  spectrometry-coupled  approach,  called  TAPI,  is 
specific  to  the  most  chemically-resistant  forms  of  protein  aggregates  and  eliminates  the 
need  to  excise  individual  bands  from  acrylamide  gels  (109;  110).  The  high  stringency  of 
TAPI  -  due  to  DNase,  RNase,  and  detergent  treatment  along  with  SDS-gel 
electrophoresis  of  aggregates  -  minimizes  the  identification  of  non-specific  and  loosely- 
associated  proteins.  In  sum,  our  data  set,  in  addition  to  the  work  of  others,  helps  identify 
the  important  factors  that  make  specific  proteins  most  vulnerable  to  inclusion  into  polyQ 
aggregates. 

Our  analysis  in  both  yeast  and  mammalian  systems  revealed  a  compact  group  of 
proteins  enriched  in  polyQ  aggregates.  In  both  cell  types,  the  proteins  recruited  to 
aggregates  belonged  to  common  functional  classes  (Tables  1  &  2);  RNA-binding  proteins 
and  endocytosis-related  proteins  were  disproportionately  found  in  all  Htt-polyQ  samples. 
Importantly,  the  proteins  identified  by  mass  spectrometry  could  be  independently 
confirmed  by  immuno-blotting.  Moreover,  our  findings  are  supported  by  a  previous  study 
in  Neuro  2A  cells,  which  coupled  Sarkosyl  treatment  with  conventional  lD-gel 
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separation  and  manual  band  excision,  to  identify  polyQ-associated  proteins  (138).  Among 
the  twelve  proteins  identified  by  Mitsui  and  coworkers,  8  were  also  identified  by  our 
approach  in  PC-12  cells  (EEF1A1,  HSPA8,  HSP90AB1,  MLF2,  PSMC1,  RAD23B, 
UBQLN2,  YWHAZ;  Table  1),  suggesting  that  across  cell  types,  certain  proteins 
consistently  have  a  high  propensity  for  inclusion  into  polyQ  aggregates.  Moreover, 
functional  orthologs  are  common  to  both  the  yeast  and  mammalian  samples 
( Y WH AZ/Bmh l/2p,  DDX5/Dhhlp,  SGTA/Sgt2p,  CLINT l/Entl/2p,  hnRNPA3/Hrpl 
and  HSPA8/Ssalp).  The  recruitment  of  SGTA/Sgt2p  and  CLINT1/Ent2p  into  polyQ 
aggregates  was  confirmed  in  both  the  yeast  and  mammalian  systems  by  immuno¬ 
detection  methods.  This  overlap  -  not  only  between  TAPI  samples,  but  also  across 
divergent  organisms  -  suggests  that  these  specific  proteins  (or  their  properties)  may  play 
a  role  in  processes  linked  to  pathological  polyQ  aggregation.  This  is  supported  in  the 
literature  where  Ssalp  (134),  HSP70  (147)  and  Sgt2p  (228)  have  all  been  directly 
connected  to  polyQ  aggregates  in  various  models.  As  for  the  clathrin-interacting  proteins 
CLINT  1  and  Ent2p,  their  presence  suggests  that  aggregates  can  have  important 
interactions  with  vesicle-dependent  processes. 

RNA  binding  proteins  are  disproportionately  recruited  to  polyglutamine  aggregates 

RNA-binding  proteins  were  highly  represented  in  the  Htt-polyQ  TAPI  data  sets 
from  yeast  and  rat  cells.  The  presence  of  quality-control  proteins,  such  as  chaperones, 
was  not  particularly  surprising  since  Htt-polyQ  forms  toxic  intracellular  aggregates. 
However,  the  enrichment  of  RNA-binding  proteins  was  unexpected,  especially 
considering  the  extensive  nuclease  treatment  that  is  used  prior  to  isolation  of  the 
aggregates  (Figure  3).  RNA-binding  proteins  have  been  shown  to  contribute  to  the 
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pathologies  of  a  number  of  neurodegenerative  diseases  (34;  158).  The  aggregation  of 
RNA-binding  proteins  is  transient  in  normal  cellular  homeostasis,  but  may  accumulate  in 
neurodegenerative  diseases  due  to  pathological  alteration  of  assembly  and  clearance 
pathways  (158).  This  aberrant  accumulation  is  frequently  tied  to  interactions  mediated 
by  “prion-like”  domains  (105;  1 19),  which  are  intrinsically  unstructured  domains  that 
resemble  the  domains  that  enable  certain  yeast  proteins  to  adopt  self-propagating  amyloid 
conformations.  It  is  this  intrinsically-disordered  property  that  is  likely  responsible  for  the 
abundance  of  RNA-binding  proteins  in  polyQ  aggregates,  because  nucleic  acid¬ 
interacting  proteins  frequently  having  intrinsically  unstructured  regions.  Thus,  our  results 
suggest  that  a  proteins ’s  ID  domain,  not  the  RNA  binding  per  se,  may  be  the  major 
detenninant  of  inclusion  into  Htt-polyQ  aggregates  (Figure  10C,  D).  This  is  corroborated 
by  our  observation  that  disrupting  the  RNA-binding  domain  of  FUS  had  no  effect  on  its 
inclusion  in  polyQ  aggregates  (Figure  10E,  F). 

Mitochondrial  proteins  are  found  in  polyglutamine  aggregates 

Mitochondrial  proteins  represent  a  significant  fraction  of  the  polyQ-associated 
proteins  in  yeast  and  rat  cells,  8%  and  18%  respectively.  In  yeast,  this  percentage  is  less 
than  the  proteome  representation  of  mitochondrial  proteins  (-18(235)),  but  in  rat  cells 
this  is  an  over-representation  (5-12%(19)).  In  yeast  cells  the  polyQ  aggregates  form 
primarily  in  the  cytoplasm,  and  in  the  rat  cells  the  aggregates  are  equally  in  the  nucleus 
and  cytoplasm.  Mitochondrial  proteins  are  mostly  synthesized  in  the  cytoplasm  and 
transported  as  unfolded  polypeptides  into  mitochondria  post-translationally.  A  probable 
explanation  for  the  presence  of  mitochondrial  proteins  in  aggregates  is  that  because  of 
their  unfolded  confonnation  they  are  vulnerable  to  integration  into  aggregates  or  because 
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of  their  dependence  on  chaperones  may  be  more  sensitive  to  general  problems  associated 
with  protein  quality  control. 

Intrinsically  disordered  domains  facilitate  protein  recruitment  to  Htt-polyQ 
aggregates 

Why  are  proteins  with  ID  domains  tightly  and  disproportionately  associated  with 
Htt-polyQ  aggregates?  Not  only  did  we  observe  an  enrichment  of  ID  domain-containing 
proteins,  but  previous  proteomic  studies  of  polyQ  also  reveal  data  sets  that  are  rich  in 
such  proteins  (97;  160;  244).  Ratovitski  et  al.  observed  that  proteins  with  ID  domains  of 
>30  amino  acids  were  enriched  in  aggregates  fonned  by  Htt-50Q  in  HEK293  cells.  This 
is  consistent  with  our  results,  although  we  found  significant  enrichment  with  very  long  ID 
domains  (>100aa).  Similarly,  Raychaudhuri  et  al  performed  a  bioinformatic  analysis  of 
intrinsic  disorder  in  neurodegenerative  disease-associated  proteins.  Their  bioinfonnatics 
dataset  (obtained  from  Entrez  Gene  database  keyword  search)  when  compared  to  control 
datasets  indicates  an  increase  in  intrinsic  disorder  (using  Foldlndex)  for  Huntington 
disease-associated  proteins  with  ID  domains  up  to  100  amino  acids  in  length  (161). 
Intrinsically  unstructured  regions  frequently  facilitate  molecular  interactions  or  serve  as 
sites  of  post-translational  modifications  (205),  as  well  as  being  prominent  features  of 
many  nucleic  acid-binding  proteins  and  chaperones  (85;  204).  These  domains  generally 
lack  hydrophobic  residues  sufficient  for  adopting  a  folded  structure  in  aqueous 
environment  (214)  and  thus  may  be  more  accessible  to  aggregation  simply  by  virtue  of 
accessibility.  We  demonstrated  that  elimination  of  the  major  ID  domains  of  two  proteins 
eliminated  their  co-aggregation  with  polyQ  (Figure  10). 

While  we  do  not  assert  that  ID  domains  are  solely  responsible  for  association  with 
Htt-polyQ  aggregates,  it  is  clear  that  ID  domain  content  plays  a  prominent  role  in 
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recruiting  secondary  proteins  to  aggregates.  In  some  cases,  quality-control  proteins  could 
even  employ  intrinsic  disorder  within  a  specific  domain  to  facilitate  the  functional 
recognition  of  misfolded  protein  aggregates.  However,  because  of  the  large  number  of 
quality-control  proteins  that  we  identified  it  cannot  be  concluded  that  there  is  a  single 
mechanism  by  which  all  such  proteins  are  tightly  associated  with  polyQ  aggregates;  some 
proteins,  such  as  UBQLN2,  likely  only  have  specific  affinity  for  aggregates  following 
ubiquitination. 

Why  are  neurodegenerative  disease-associated  proteins  recruited  to  polyglutamine 
aggregates? 

Neurodegenerative  disease-linked  proteins  that  have  been  identified  in  cellular 
inclusions  in  their  respective  diseases  represent  a  fifth  (19/91)  of  the  total  TAPI-identified 
Htt-polyQ  aggregate-associated  proteins  (Table  4).  Examining  this  sub-set  reveals  that 
nearly  all  of  them  contain  ID  domains.  We  observed  that  some  ALS-associated  proteins 
were  trapped  in  polyQ  aggregates.  Three  of  these  proteins  -  FUS,  HNRPA1  and  TDP-43 
-  are  RNA-binding  proteins  that  form  pathological  inclusions  in  certain  forms  of  ALS 
and  frontotemporal  lobar  dementia  (FTLD)  (26;  37;  125).  These  proteins  have  ID 
domains  that  resemble  the  domains  of  yeast  prion  proteins  due  to  similar  amino-acid 
composition.  It  has  been  concluded  that  these  prion-like  domains  may  be  primary¬ 
aggregating  species,  but  the  fact  that  these  ALS-associated  proteins  are  pulled  into  polyQ 
aggregates  suggests  that  in  some  long-lived  cells,  such  as  neurons,  there  could  be 
underlying  protein  quality  control  problems  with  proteins  like  FUS  and  TDP-43  getting 
preferentially  recruited  into  pre-existing  primary  aggregates  (59).  For  example,  it  is 
possible  that  over  decades,  intermediate-length  polyQ  expansions  in  various  proteins  lead 
to  persistent  aggregates  that  recruit  proteins  with  ID  domains;  these  inclusions  would  be 
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marked  ( i.e .  immuno-positive)  by  specific  aggregation-prone  proteins.  Alternatively,  the 
diminution  of  protein-quality  control  with  aging  may  create  a  cellular  environment  where 
proteins  with  long  ID  domains  are  increasingly  susceptible  to  aggregation,  thus  polyQ 
models  and  their  induced  stress  may  be  good  tools  for  identifying  proteins  that  are  most 
vulnerable  to  aggregation  under  conditions  of  compromised  protein-quality  control. 

Mechanisms  of  Toxicity 

Both  the  yeast  and  mammalian  model  systems  show  a  correlation  between  polyQ- 
expanded  huntingtin  aggregation  and  cellular  toxicity  (134;  239),  however  much  debate 
still  persists  as  to  the  mechanism  by  which  aggregation  leads  to  cell  death  (74).  One 
possible  mechanism  is  that  sequestration  of  proteins  to  an  aggregate  may  impair  cellular 
function  (174),  which  is  arguably  an  indirect  loss  of  function.  Such  sequestration  by 
aggregates  of  polyQ-expanded  Ataxin3  (spinocerebellar  ataxia-causing  protein)  was 
proposed  to  cause  a  loss-of-function  toxicity  (243).  Our  observations  suggest  this  indirect 
loss-of-function  toxicity  due  to  sequestration  of  essential  proteins  could  occur  with 
huntingtin  aggregation  as  well.  We  observe  altered  cellular  localization  for  a  subset  of 
proteins  when  Htt-polyQ  aggregates  are  present.  It  is  possible  that  these  proteins  may 
maintain  some  function  while  associated  with  the  Htt-polyQ  aggregate.  If  the  ID  domain 
of  a  protein  becomes  embedded  in  the  polyQ  aggregate,  while  globular  or  functional 
domains  remain  peripheral,  function  in  the  wrong  place  at  the  wrong  time  could  be  a 
gain-of-function  toxicity  associated  with  aggregates.  Stoichiometrically,  this  may  be 
more  plausible  than  loss-of-function  because  we  observe  only  a  fraction  of  any  given  co¬ 
aggregating  species  mis-localized  to  the  polyQ  aggregate,  which  itself  is  quite  abundant 
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due  to  over-expression  of  Htt  (Figure  9).  Of  course,  a  combination  of  gain-of-function 
and  loss-of-function  mechanisms  could  contribute  to  the  overall  cellular  toxicity. 

Although  many  techniques  have  been  employed  to  identify  huntingtin-interacting 
proteins,  few  examine  specifically  the  amyloid  form.  Our  results  show  that  a  select  group 
of  proteins  are  trapped  by  polyQ  amyloid-like  aggregates.  Proteins  with  long  ID  domains 
are  disproportionately  prone  to  inclusion,  as  are  many  proteins  that  are  associated  with 
other  neurodegenerative  diseases.  The  enrichment  in  ID  domain-containing  proteins  in 
polyQ  aggregates,  and  the  elimination  of  this  enrichment  when  the  ID  domains  are 
deleted,  reveals  the  significant  role  of  protein  structure  in  detennining  if  a  protein  gets 
secondarily  recruited  into  certain  types  of  aggregates.  Thus,  while  some  proteins  might  be 
predicted  to  be  recruited  into  aggregates  because  of  their  function  ( i.e .  quality-control 
proteins  or  proteins  that  interact  with  the  soluble  form  of  an  aggregating  species),  many 
proteins  may  be  recruited  simply  as  a  consequence  of  their  secondary  and  tertiary 
structural  elements.  The  metastable  structure  and  accessibility  of  long  ID  domains  may 
render  proteins  particularly  susceptible  to  aberrant  inclusion  in  amyloid-like  aggregates. 
Recently,  Habch  and  colleagues  put  forth  the  idea  that  ID  proteins  represent  a  class  of 
phannacological  targets  (72).  As  our  results  suggest,  if  the  recruitment  of  specific  ID 
domain-containing  proteins  into  pathological  aggregates  is  critical  to  cellular 
degeneration,  then  targeting  ID  domains  to  reduce  their  sequestration  may  have 
therapeutic  potential  in  a  variety  of  neurodegenerative  diseases. 
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Table  2:  Molecular  function  of  proteins  associated  with  Htt-polyQ  aggregates  as  identified  by 
TAPI  (N=52)  in  Saccharomyces  cerevisiae 


Category 

%  of 
total 

Protein  Name 

Protein  Quality 
Control/  Chaperone 

12% 

Apjl 

Bmhl 

Defl 

Meal 

Sgt2 

Ydjl 

RNA/DNA  Binding 

44% 

Ccr4 

Cyc8 

Dhhl 

Eapl 

Hrpl 

Ixrl 

Mbfl 

Mcml 

Mot3 

Nab3 

Nam8 

Newl 

Nrdl 

Pbpl 

Pin4 

Pop2 

Puf3 

Snf5 

Srp54 

Taf5 

Tupl 

Whi3 

Ygr250c 

Mitochondrial 

8% 

Apjl 

Nam8 

Puf3 

Ynl208W 

Endocytosis, 
vessicle  & 

cytoskeletal  transport 

21% 

Akll 

Entl 

Ent2 

Gtsl 

Panl 

Scd5 

Slal 

Pin3 

Sec24 

Yap 1801 

Yap 1802 

Other 

21% 

Cbkl 

Epol 

Gal2 

Mum2 

Nup57 

NuplOO 

Nupl 16 

Pgdl 

Smll 

Slml 

Ylrl77w 

Molecular  function  determined  by  gene  ontology  and  Saccharomyces  Genome  Database.  Proteins  in  bold  represent 
those  with  overlapping  cellular  functions;  totals  exceed  100%  because  of  multiple  categorizations.  Yeast  ribosomal 
proteins  Rpl30  and  Rps6b  were  excluded. 


Table  3:  Molecular  function  of  proteins  associated  with  Htt-polyQ  aggregates  as  identified  by 
TAPI  (N=91)  in  Rattus  norvegicus 


Category 

%  of 

total  Protein  Name 

Protein  Quality 

Control/ 

Chaperone 

26% 

Adrml 

Cite 

Ddi2 

Dnaja2 

Dnaja4 

Dnajbl 

Dnajc7 

Hspa8 

Hsp90aal 

Lap3 

Ppia 

Psmb2 

Psmcl 

Psmc2 

Psmc3 

Rad23b 

Sgta 

Sqstml 

Sumo2 

Ubqln2 

Ubqln4 

Ubxn711 

Usp7 

Vps35 

RNA/DNA 

binding* 

37% 

Aars2 

Akap81 

Arl6ip4 

Atp5al 

Atrx 

Ddx5 

Dynclhl 

Eif4gl 

Eif4g2 

Fasn 

Fus 

Gigyf2 

Hnrnpa3 

Hnrnpf 

Hnrnph2 

Hnrnpm 

Hnrnpu 

Hsp90aal 

Matr3 

Nono 

Nr3cl 

Pcbpl 

Ppia 

Prpf40a 

Prrc2b 

Rbmsl 

Sfl 

Tardbp 

Tcergl 

Tcf20 

Tnrc6b 

Tufm 

Xm2 

Ythdfl 

Mitochondrial 

18% 

Aars2 

Acad9 

Acsf2 

Atp5al 

Cite 

Etfb 

Gls 

Hadha 

Idh2 

Idh3B 

Ndufs7 

Ogdh 

Pck2 

Pdhb 

Suclg2 

Tufm 

Endocytosis, 
vessicle  & 
cytoskeletal 
transport 

14% 

Aakl 

Arl6ip4 

Asapl 

Clintl 

Cite 

Cnn2 

Dynclhl 

Myold 

Nsfllc 

RablO 

Scyl2 

Tfg 

Vps35 

Other 

15% 

Ep300 

Gnaol 

Kprp 

Ldha 

Magedl 

Mlf2 

Phgdh 

Plekhb2 

Ppp2rla 

Sik3 

Sfn 

Tgm3 

Thyl 

Ywhab 

Molecular  function  assignments  were  determined  by  gene  ontology,  RGD,  and  NCBI.  *RNA-binding  was  assigned  in 
some  cases  on  empirical  data  (20),  thus  classification  does  not  necessarily  imply  primary  function.  Proteins  in  bold 
represent  those  that  were  placed  in  multiple  categories,  thus  totals  can  exceed  100%.  Rat  ribosomal  proteins  Rpl6  and 
Rpll3a  were  excluded. 
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Table  4:  Htt-polyQ  aggregate-associated  proteins  found  in  inclusions  and  aggregates  in  various 
neurodegenerative  diseases. 


Name 

Disease  or 
Disease  Model 

ID  Domain 

RBP 

PQC 

Reference 

Aakl 

ALS-SOD1 

209,  81,  50 

(185) 

Fus 

ALS 

284,  71,  87 

yes 

(48;  116;  217) 

Hnrnpa3 

ALS 

39 

yes 

(141) 

Hspa8 

HD  &  SCA1 

46,  36 

yes 

(36;  87) 

Matr3 

ALS 

31,57,42,  119,  72 

yes 

(92) 

Mlf2 

HD 

136 

(104) 

Nono 

ALS-FUS 

66 

yes 

(183) 

Ppia 

ALS-SOD1 

None 

yes 

yes 

(7) 

Rad23b 

HD 

120,  67,41 

yes 

(10;  224) 

Sqstml 

PD  &  ALS 

153 

yes 

(115;  139;  248;  250) 

Suclg2 

AD 

None 

(159) 

Sumo2 

HD  &  ALS 

None 

yes 

(146;  149) 

Tardbp 

(TDP-43) 

ALS 

45,  55 

yes 

(96;  192) 

Tcergl 

HD 

138,  177,70,36 

yes 

(80) 

Tcf20 

HD 

87,  739,  50,  32, 

709, 34, 121 

yes 

(242) 

Tfg 

CMTD 

35,  54,  105 

(84) 

Tgm3 

HD 

49 

(246) 

Ubqln2 

ALS 

62,58,42,81,31, 

84 

yes 

(43) 

Ywhab 

ALS 

None 

(100) 

The  numbers  for  ID  Domains  indicate  the  length  of  distinct  unstructured  regions  >30,  for  the 
identified  rat  proteins  from  PC-12  cells,  as  determined  by  IUPred-L.  RBP  =  RNA-binding 
protein  as  described  in  the  legend  of  Table  3;  PQC  =  protein  quality  control. 
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Table  5:  Analogous  proteins  in  yeast  and  rat  associate  with  Htt-PolyQ  aggregates. 


Rat  TAPI  Protein 

Yeast  TAPI  protein 

Function 

Aakl 

Akll 

Ser/Thr  kinases  involved  in  eudocvtosis  atid  actin  cytoskeletou 
organization:  recruit  endocytic  accessory  factors 

Sfn/Y whab 

Bmlil 

14-3-3  proteins;  associated  with  diverse  protein  binding  and 
signaling  activities 

Dnaja2/Dnaja4/ 

Dnajc7/Dnajbl 

Ydjl/Apjl 

HSP40  co-chaperones;  function  with  HSP70s;  generally  involved  ill 
protein  folding  and  quality  control 

Hmnpa3 

Hipl 

Heterogeneous  nuclear  ribouucleopro terns;  hind  RNA:  involved  in 
RNA  processing 

Sgta 

Sgt2 

Glutamine-rich  cytoplasmic  co-chaperone;  functions  in  post- 
translational  membrane  insertion  of  proteins  in  yeast 

Clint  1 

Entl/Ent2 

Epsin-like  proteins  involved  in  endocytosis:  clathrin  interactors 

Ddx5 

Dhlil 

Cytoplasmic  DExD/H-box  RNA  helicases:  multiple  RNA-related 
fimctions 

Protein  similarity  determined  using  Ensembl  comparative  genomics.  RGD.  SGD.  and  %  identity. 
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CHAPTER  3:  Discussion  and  overall  conclusions 


Mechanism  of  Htt-polyQ  Toxicity 

Disease  research  has  revealed  that  amyloid-like  protein  aggregates  are  associated 
with  the  pathology  of  many  neurodegenerative  diseases  (33;  107;  165;  212).  These 
aggregates  were  shown  to  contain  not  only  the  disease-associated  protein,  but  also  many 
other  cellular  proteins.  In  this  work  we  have  utilized  TAPI  to  identify  aggregate- 
associated  cellular  proteins.  Using  deletional  analysis,  we  have  also  determined  that  the 
intrinsically  disordered  (ID)  domains  regions  within  two  of  these  proteins  -  Sgt2  and 
FUS  -are  required  for  their  association  with  Htt-polyQ  aggregates. 

ID  Domains  in  Htt-polyQ  Aggregates 

ID  domains  are  regions  of  30  amino  acids  or  greater  that  contain  a  higher 
proportion  of  low  hydrophobicity  and  high  net  charge  residues,  resulting  in  a  less 
compact  structure  and  defined  by  the  lack  of  globular  structure  (168).  Even  though  these 
regions  do  not  form  a  folded  structure,  or  share  sequence  homology,  proteins  containing 
ID  domains  perform  many  functions .  These  include  post-translational  modification  to 
proteins,  binding  RNA  or  DNA,  or  acting  as  chaperones  in  protein  folding  (Tables  2  & 

3).  Intriguingly,  biophysical  analysis  of  the  Htt-polyQ  interacting  proteins  that  we  have 
identified  reveals  that  the  functions  of  proteins  containing  regions  of  intrinsic  disorder  are 
largely  the  same  as  those  identified  for  ID  domain  proteins  in  general.  Previous  work  by 
Ratovitski  and  colleagues  as  well  as  Raychaudhuri  and  colleagues  suggests  that 
intrinsically  disordered  (ID)  domains  may  be  shared  among  cellular  proteins  recruited  to 
amyloid-like  aggregates  (160;  161). 
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Interest  in  intrinsic  disorder  has  increased  substantially  in  the  last  fifteen  years  as 
the  identification  of  proteins  containing  ID  domains  has  concomitantly  grown  (162). 
Currently  it  is  estimated  that  between  15  and  45%  of  eukaryotic  proteins  contain  ID 
domains  30  amino  acids  or  greater  in  length  (81;  140).  ID  domains  are  conformationally 
indistinct,  defined  by  their  lack  of  ordered  structure  these  regions  exist  as  highly  dynamic 
ensembles  of  conformations,  enabling  them  to  interact  with  many  binding  partners  (2; 
140;  162;  168).  The  plasticity  of  these  regions  lends  specific  advantages  to  the  proteins 
containing  them. 

First,  the  constant  change  in  conformation  and  disorder- to-order  transition  upon 
target  binding  of  ID  domains  results  in  a  decoupling  of  specificity  and  affinity,  resulting 
in  high  specifity/low  affinity  interactions.  The  disorder-to-order  transition  upon  target 
binding  of  ID  domains  also  increases  the  speed  of  interaction  (168).  The  interaction 
surfaces  of  ID  domains  are  large  and  contain  multiple  molecular  recognition  features 
(MoRFs)  allowing  for  multiple  distinct  interactions  to  occur  within  a  common  binding 
surface  (81;  129;  162;  168;  219).  In  addition,  ID  domains  are  promiscuous  interactors, 
resulting  in  the  classification  of  ID  domain-containing  proteins  as  ‘hub’  proteins  in 
cellular  networks  (81;  129;  162;  168). 

The  constant  fluctuation  in  ID  domain  conformation  means  a  single  structure  for 
these  regions  does  not  exist.  However,  recently  many  methods  have  been  utilized  in 
attempts  to  characterize  the  structure  ensembles  of  ID  domains  including:  nuclear 
magnetic  resonance  (NMR)  spectroscopy,  small  angle  X-ray  scattering,  single  molecule 
fluorescence,  molecular  dynamics  (MD)  simulation,  paramagnetic  relaxation 
enhancement  (PRE),  and  circlular  dichroism  (2;  90;  102;  106;  140).  The  most  successful 
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of  these  methods  are  NMR  spectroscopy  and  MD  simulations  resulting  in  a  database  of 
ID  domain-containing  protein  conformational  ensembles  database  (218). 

Although  ID  domains  are  necessary  and  incredibly  useful  in  cellular  pathways, 
failure  to  regulate  ID  domain  behavior  is  linked  to  variety  of  diseases  (2;  94;  107;  189; 
215).  Many  of  the  disease-linked  proteins  associated  with  neurodegenerative  diseases 
contain  ID  domains  (215).  In  fact,  the  mechanism  of  polyQ  toxicity  arose  from  the 
observation  that  proteins  with  longer  polyQ  repeats,  irrespective  of  protein  content,  form 
insoluble  aggregates  (211).  Biophysical  characterization  of  polyQ  peptides  of  varying 
length  reveals  they  are  intrinsically  disordered  (24).  Here  we  have  shown  that  the 
mechanism  of  toxicity  in  HD  seems  to  be  intrinsically  linked  to  ID  domains;  both  in  the 
disease-linked  huntingtin  protein,  and  those  proteins  recruited  to  Htt-polyQ  aggregates  in 
disease  models  (232).  Considering  that  there  are  at  least  eight  other  neurodegenerative 
diseases  associated  with  aggregating  proteins  containing  expanded  polyQ  regions,  it  is  of 
great  interest  to  investigate  if  this  theory  extends  beyond  HD  to  include  all  of  these 
polyQ-associated  neurodegenerative  diseases. 


CORRELATION  BETWEEN  TAPI  FINDINGS  WITH  OTHER  STUDIES  OF  Htt- 
polyQ-ASSOCIATED  PROTEINS 

Our  analysis  also  revealed  that  a  significant  fraction  of  the  proteins  identified  in 
Htt-polyQ  aggregates  were  indicated  in  other  neurodegenerative  diseases  (19/91)  (232) 
(Table  4).  The  finding  that  aggregate-associated  proteins  of  one  neurodegenerative 
disease  are  implicated  in  another  suggests  a  unifying  feature  may  exist  linking  all  of  these 
diseases.  A  significant  portion  of  the  aggregate-associated  proteins  also  contain  ID 
domains.  In  the  future,  it  would  be  of  interest  to  determine  if  ID  domains  represent  a 
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shared  biophysical  attribute  amongst  proteins  recruited  to  neurodegenerative  disease- 
related  aggregates. 

Our  TAPI  analysis  also  revealed  novel  proteins  associated  with  amyloid-like  Htt- 
polyQ  along  with  cellular  proteins  previously  shown  to  be  associated  with  HD  (Tables  2 
&  3).  Of  particular  interest  are  two  subsets;  proteins  conserved  in  both  yeast  and 
mammalian  model  systems,  and  cellular  proteins  identified  in  protein  aggregates  of  other 
neurodegenerative  diseases  (Table  4).  Our  TAPI  analysis  revealed  a  cohort  of  proteins 
conserved  between  mammalian  and  yeast  systems  -  YWHAB/Bmhl/2p,  DDX5/Dhhlp, 
SGTA/Sgt2p,  CLINT l/Entl/2p,  hnRNPA3/Hrpl  and  HSPA8/Ssalp.  The  association  of 
two  pairs  of  these  SGTA/Sgt2p  and  CLINT1/Ent2p  were  further  confirmed  by 
immunoblotting  for  interaction  with  Htt-polyQ  aggregates.  These  results  suggest  that 
comparison  of  our  TAPI  results  with  those  of  other  studies  may  reveal  a  smaller  subset  to 
target  for  further  investigation. 

Several  studies  of  protein  aggregates  associated  with  Htt-polyQ  relied  on  a  priori 
knowledge  of  target  proteins  or  hypotheses  of  associated  proteins  (66;  101;  132;  155; 

228;  237).  These  studies  sought  to  confirm  the  aggregation  of  previously  identified 
proteins  with  Htt-polyQ.  However,  two  studies  in  the  yeast  system  examine  Htt-polyQ 
aggregate-associated  proteins  without  bias  (155;  229).  Together  with  our  own  TAPI 
analysis,  these  three  studies  make  up  our  meta-dataset.  Each  of  these  three  studies 
utilized  different  techniques  for  the  isolation  and  purification  for  aggregates.  TAPI 
analysis  identified  96  proteins  specifically  associated  with  Htt-polyQ  (Appendix  4).  Park 
and  colleagues  performed  Stable  Isotope  Labeling  with  Amino  Acids  in  Culture 
(SILAC):  growth  of  Htt-polyQ  expressing  yeast  strains  in  culture  media  containing 
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heavy  (l3C/15H)  amino  acids,  coupled  with  LC-MS/MS  analysis  (155).  This  differential 
growth  technique  allows  for  inclusion  of  both  the  control  (light  amino  acids)  and  Htt- 
polyQ  expressing  strains  (heavy  13C/I5H  amino  acids)  in  a  single  mass  spectrometry  run. 
This  analysis  identified  106  proteins  specifically  associated  with  Htt-polyQ  (found  in  just 
the  heavy  13C/I5H  mass  spec,  sample)  in  two  of  three  biological  replicates  (Appendix  4) 
(155).  Wang  and  colleagues  expressed  the  Htt-Q103  plasmid  construct  followed  by  3D 
antibody  mesh  immunoprecipitation  (IP)  coupled  with  2D  gel  analysis  and  mass  spec 
protein  identification  (228).  In  this  method  sequential  antibody  incubation  with  the 
sample  forms  an  antibody  mesh  around  the  targeted  aggregate,  which  is  then  precipitated 
and  purified  by  centrifugation  through  a  sucrose  pad.  The  sample  is  then  run  on  a  2D  gel, 
bands  of  interest  are  removed  and  analyzed  by  mass  spectrometry  (Appendix  4)  (228). 

Comparison  of  these  datasets  -  TAPI,  2D  gel,  and  SILAC  -  revealed  just  five 
proteins  shared  identified  in  all  three  analyses  -  Bmhl,  Ssal,  Sgt2,  Pin3,  and  YNL208W 
(Figure  11).  Bmhl  and  Ssal  have  been  investigated  previously  with  regards  to  their 
contribution  to  Htt-polyQ  aggregation  (see  below).  However,  the  remaining  three 
proteins  have  not  previously  been  studied.  Here,  we  will  discuss  the  current  state  of  our 
understanding  of  the  association  of  these  proteins  with  Htt-polyQ. 
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Figure  11:  Meta-analysis  identifies  Htt-polyQ  aggregate  associated  gene  candidates. 

Three  data  sets  were  examined  for  shared  Htt-polyQ  aggregate-associated  proteins  - 
SILAC  (Park  2013),  2D  gel  (Wang  2007),  and  TAPI  (Wear  2015).  Five  candidate 
proteins  found  by  all  three  association  studies  (see  inset)  identified  for  follow-up  study. 


Bmhl  is  a  14-3-3  regulatory  protein  involved  in  regulation  of  exocytosis,  vesicle 
transport,  aggresome  formation,  apoptosis,  and  post-transcriptional  protein  levels  (12;  15; 
31;  63;  216;  241).  Other  14-3-3  proteins  have  previously  been  reported  to  associate  with 
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Htt-polyQ  aggregates  (223),  and  the  levels  of  14-3-3  protein  in  the  cerebrospinal  fluid 
(CSF)  is  currently  used  as  a  biomarker  and  diagnostic  factor  for  the  human  prion  disease 
Crutzfeld-Jacob  Disease  (CJD)  (68;  179).  Wang  and  colleagues  showed  that  deletion  of 
Bmhl ,  but  not  its  homolog  Bmh2,  eliminated  Htt-Q103  aggregation  in  the  yeast  system, 
suggesting  that  this  protein  may  play  an  active  role  in  aggregate  formation  or  stabilization 
(229).  However,  deletion  of  Bmhl  also  resulted  in  increased  Htt-Q103  toxicity;  this 
suggests  that  aggregates  may  have  lower  toxicity  than  free  Htt-polyQ  (229). 

Ssal  is  a  member  of  the  HSP70  chaperone  family  and  has  three  nearly  identical 
additional  family  members,  Ssa2,  Ssa3,  and  Ssa4.  Hsp70  proteins  have  been  shown  to 
work  with  HsplOO  disaggregase  Hspl04  in  fractionating  amyloid  aggregates  in  yeast.  For 
a  particular  aggregate,  the  prion  [URE3],  overexpression  of  Ssal,  but  not  Ssa2,  was  able 
to  completely  eliminate  the  aggregate  from  yeast  cells  (181).  Meriin  and  colleagues 
examined  the  effects  of  Ssa  family  proteins  on  Htt-Q103  toxicity  in  yeast  and  found  that 
deletion  of  Ssal/Ssa2  suppressed  Htt-polyQ  cell  toxicity  in  a  previous  study  (135). 
However,  deletion  of  Ssal/Ssa3  or  Ssal/Ssa4  did  not  affect  Htt-polyQ  toxicity  (135), 
indicating  each  of  these  Ssa  proteins  may  have  unique  functions  that  are  masked  in 
double  deletions. 

Sgt2  is  implicated  as  a  scaffold  protein  in  the  Guided  Entry  of  Tail-anchored  (TA) 
proteins  (GET)  trafficking  pathway  where  it  binds  the  hydrophobic  tails  of  TA  proteins  to 
inhibit  their  aggregation  prior  to  membrane  insertion  (103).  Chartron  and  colleagues 
examined  the  interactions  of  Sgt2  with  Get  proteins,  HSP  chaperones,  and  TA  proteins 
(22).  They  found  that  the  C-terminus  of  Sgt2  mediates  the  interaction  between  HSP 
chaperones  and  TA  proteins  while  the  N-terminus  of  Sgt2  binds  Get4/Get5  (22).  We 
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found  that  deletion  of  the  C-terminal  ID  domain  of  Sgt2,  Sgt2  A  300  346 ,  interrupted 
interaction  between  Sgt2  and  Htt-polyQ  aggregates  (232).  The  C-terminal  region  of  Sgt2 
previously  shown  to  interact  with  HSP  chaperones  and  TA  proteins  overlaps  with  the  C- 
terminal  ID  domain,  encompassing  amino  acids  220-346  (22;  103).  It  is  possible  that  the 
C-terminal  ID  domain  of  Sgt2  is  responsible  for  interaction  with  proteins  with  a  higher 
propensity  to  aggregate,  like  TA  proteins  and  Htt-polyQ.  Of  particular  interest  for  future 
research  is  to  determine  if  Sgt2/SGTA  interacts  with  all  HSP70  chaperone  proteins,  or 
only  specific  partners. 

Pin3  acts  in  concert  with  paralog  Lbsl  to  negatively  regulate  nucleation  of  actin 
filaments  (27;  190).  Pin3  was  also  identified  in  a  screen  for  inducers  of  the  yeast  prion 
[PIN+]  suggesting  a  role  in  prion  formation  (44).  Htt-polyQ  forms  prion-like  aggregates, 
suggesting  that  Pin3  may  influence  Htt-polyQ  aggregation  similar  to  yeast  prions.  Further 
investigation  will  be  necessary  to  determine  if  Pin3  associates  with  Htt-polyQ  in  a 
functional  manner. 

YNL208W  is  an  uncharacterized  protein  with  no  known  function.  It  is  proposed 
to  interact  with  ribosomes  and  mitochondria  (56;  164).  YNL208W  is  nearly  entirely 
disordered.  In  fact,  biophysical  analysis  of  these  overlapping  proteins  shows  that  the  only 
characteristic  shared  among  all  five  is  that  they  all  contain  at  least  one  ID  domain  (Figure 
12).  This  further  suggests  that  ID  domains  play  a  role  in  recruitment  of  proteins  to  Htt- 
polyQ. 

In  the  future  these  proteins,  particularly  Sgt2,  Pin3,  and  YNL208W  are  of  interest 
to  characterize  their  impact  on  Htt-polyQ  associated  toxicity. 
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Figure  12:  Proteins  that  alter  Htt-polyQ  toxicity  contain  intrinsically  disordered 
(ID)  domains. 

T API-identified  Htt-polyQ  interacting  cellular  proteins  confirmed  by  follow-up  analysis 
were  examined  to  identify  conserved  domains  using  NCBI  conserved  domains  database 
(CDD)  as  well  as  ID  domains  using  IUPred-L.  Proteins  containing  14-3-3  domains  are 
regulatory  molecules  involved  in  signaling,  SRC  Homology  3  (SH3)  domains  are  found 
in  proteins  of  signaling  pathways,  tetratricopeptide  repeat  (TPR)  domains  are  involved  in 
protein-protein  interactions  form  scaffolds,  and  chaperone  domains  are  involved  in 
protein  folding  and  unfolding  as  well  as  the  assembly  and  disassembly  of  protein 
complexes. 


ID  domains  as  Huntington  Disease  Therapeutic  Targets 

When  considering  the  impact  of  this  research  on  the  area  of  neurodegenerative 
disease  research  and  specifically  HD,  it  is  important  to  examine  the  possibility  for 
therapeutic  development.  The  survival  of  patients  diagnosed  with  HD  is  only  one  to  two 
decades  beyond  diagnosis.  Unlike  many  other  diseases,  no  therapeutics  for  HD  are 
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currently  able  to  prolong  patient  survival.  The  development  of  a  therapeutic  to  extend  HD 
patient  survival  would  have  significant  impact  on  both  patients  and  their  families. 

Research  identifying  huntingtin-interacting  proteins  has  led  to  the  identification  of 
molecular  pathways  involved  in  HD-associated  neuron  death.  However,  no  single 
molecular  pathway  identified  has  been  able  to  account  for  all  neuron  death  or  reverse 
progression  of  HD  pathology.  The  identification  of  a  common  link  between  all 
huntingtin-interacting  proteins  -  the  ID  domain  -  represents  a  potential  drug  target  for 
HD.  The  identification  of  ID  domains  as  a  common  link  is  furthered  by  our  observation 
that  deletion  of  ID  domains  in  select  proteins  -  Sgt2,  and  FUS  -  results  in  attenuation  of 
target  protein  interaction  with  Htt-polyQ.  When  considering  this  observation,  it  seems 
likely  that  ID  domains  may  play  a  role  in  recruitment  of  proteins  to  Htt-polyQ  and  may 
impact  disease-associated  neuron  death. 

Recent  studies  indicate  that  ID  domains  represent  drug  targets  for  therapeutic 
development  (72;  81;  94;  129;  140;  207;  211).  However,  the  usual  methods  for  protein- 
targeted  drug  development  cannot  be  applied  to  ID  domains  due  to  the  lack  of  stable 
three-dimensional  structure.  In  fact,  a  search  of  the  current  “druggable  proteome”  by  Hu 
and  colleagues  shows  that  current  known  drug  targets  contain  two-fold  less  disorder  than 
the  human  proteome  (81).  To  circumnavigate  these  challenges  a  number  of  theoretical 
methods  have  been  proposed  for  the  intelligent  design  of  drugs  targeting  ID  domains. 
These  include  (i)  inhibition  or  (ii)  modulation  of  ID  domain  interactions  with  binding 
partners  (iii)  stabilization  of  the  ID  domain  in  natively  disordered  state  and  (iv)  induction 
of  allosteric  inhibition  (2;  94;  129). 
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Currently  two  protein-protein  interactions  involving  ID  domain-containing 
proteins  have  been  successfully  targeted.  Peptides  and  small  molecules  have  been 
identified  that  inhibit  the  interaction  of  p53  and  Mdm2  (58;  225)  and  small  molecules  that 
bind  to  monomeric  and  disordered  c-Myc  and  inhibit  c-Myc-Max  interaction  (73;  77; 

136)  have  been  developed  for  cancer  treatment.  Currently,  no  small  molecules  targeting 
protein-protein  interactions  in  neurodegenerative  diseases  have  been  identified. 

Although  no  small  molecules  targeting  protein-protein  interactions  have  been 
identified,  small  molecules  targeting  the  ID  domain  structure  have  been  more  successful 
with  regards  to  neurodegenerative  disease-associated  proteins.  Toth  and  colleagues 
utilized  an  in  silico  structure-based  fragment  mapping  and  docking  screen  to  identify 
compounds  that  would  bind  to  one  or  more  of  eight  theoretical  binding  pockets  in  the 
various  conformational  states  of  the  a-Synuclein  protein  (207).  The  compound, 
ELN484228,  was  effective  at  reducing  neuronal  toxicity  in  some,  but  not  all,  of  the 
Parkinson’s  disease  models  tested  (207).  Although  this  is  only  preliminary,  this  study, 
along  with  a  theoretical  analysis  of  the  Alzheimer’s  Ap  peptide  (25 1)  indicate  that 
analysis  of  ID  domain  conformational  ensembles  for  binding  pockets  has  the  potential  to 
reveal  viable  small  molecules  (25 1).  Sormanni  and  colleagues  utilized  a  computational 
method  to  identify  complementarity  determining  regions  (CDRs)  to  target  specific 
epitopes  within  ID  domains  (189).  These  CDRs  were  then  grafted  onto  an  antibody 
scaffold.  These  CDR  antibodies  were  developed  for  three  proteins  -  a-Synuclein,  Ap,  and 
Islet  Amyloid  Polypeptide  (IAPP)  -  and  confirmed  to  target  these  proteins  by  enzyme- 
linked  immunosorbent  assay  (ELISA)  (189). 


53 


Conclusion 


It  was  shown  that  Htt-polyQ  aggregates  and  associated  cellular  proteins  can  be 
isolated  and  characterized  without  a  priori  knowledge  of  protein  identities  using  the 
TAPI  method.  The  proteins  associated  with  Htt-polyQ  in  mammalian  and  yeast  models 
show  overlapping  molecular  functions  and  conserved  ID  domains.  Deletion  of  ID 
domains  in  two  identified  proteins  (FUS  and  Sgt2)  resulted  in  loss  of  association  between 
that  protein  and  Htt-polyQ  suggesting  that  ID  domains  are  important  for  cellular  protein 
association  with  Htt-polyQ. 

Many  of  the  mammalian  proteins  identified  by  TAPI  are  indicated  in  other 
neurodegenerative  diseases  and  also  contain  ID  domains.  Recent  publications  indicate 
that  ID  domain  containing  proteins  and  protein-protein  interactions  involving  these 
proteins  can  be  targeted  with  small  molecules  and  rationally  designed  antibodies. 
However,  thus  far,  no  ID  domain-targeting  drugs  have  been  identified  for  the  mutant 
huntingtin  protein.  Our  studies  indicate  that  huntingtin  ID  domain-targeting  drugs 
represent  the  next  step  in  HD  research.  If  successful,  these  therapeutics  would  represent 
the  first  HD  treatment  to  target  symptoms  of  the  disease. 
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CHAPTER  4:  Materials  and  methods 

Yeast  Strains  and  Plasmids 

Yeast  strains  BY4741  (MATa  his  3  leu2  metl5  ura3  [PIN+]  [psi-])  or  W303 
(MATa  leu2  ade2-l  ura3  canl  trpl  his3  ga/+ )  were  transformed  with  Gal-Inducible 
Huntingtin  Exon  1  polyQ  expansion  plasmids  (Htt-Q25-GFP  or  Htt-Q103-GFP)  (109; 
134).  Genes  (or  truncated  variants)  were  cloned  into  pFPS261  and  pFPS262,  which 
encode  single  HA  tags  in-frame  with  the  multiple  cloning  site,  thus  adding  c-terminal 
epitope  tags  to  genes  cloned  into  the  Xhol  site.  Plasmids  pFPS261  (CEN  LEU 2  Pgali ) 
and  pFPS262  (2p  LEU2  Pgali)  are  respectively  derivatives  of  previously-described 
pH316  (55)  and  pH317  (54).  Truncated  SGT2-AID  is  SGT2A300-346,  and  truncated  F US¬ 
AID  is  FUSA1-135.  Human  a-synuclein-GFP  was  expressed  from  the  previously- 
described  plasmid  DK258  (2u  LEU2  Pgali)  (113).  All  strains  were  cultured  in  synthetic 
defined  media  with  appropriate  auxotrophic  selection  for  plasmid  maintenance.  Protein 
expression  was  induced  overnight  with  growth  on  selective  galactose-containing  medium. 

Cell  Lines  and  Maintenance 

The  PC- 12  cell  lines  were  previously  described  by  Wyttenbach  et  al.  (239). 
Briefly,  the  PC  12  cells  were  stably  transformed  with  Doxycycline-inducible  GFP-tagged 
normal  Htt-Q23  Exon  1  or  expanded  Htt-Q74.  Cells  were  cultured  on  collagen  IV  (BD 
Biosciences)  coated  T-75  flasks  and  maintained  in  DMEM  with  75  pg/mL  hygromycin, 
100  U/mL  penicillin/streptomycin,  2  mM  L-glutamine,  10%  heat-inactivated  horse 
serum,  5%  Tet-negative  fetal  bovine  serum  and  100  pg/mL  G418  at  37°C,  10%  CCF. 
Culture  reagents  were  obtained  from  Corning. 

Technique  for  Amyloid  Purification  and  Identification  (TAPI) 
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The  yeast  TAPI  protocol  was  performed  as  previously  described  (109)  with  the 
following  alterations:  Buffer  A  -  30mM  Tris-HCl  pH=7.5,  5  rnM  DTT,  40  rnM  NaCl,  3 
mM  MgC'b,  5%  glycerol,  lx  Complete  protease  inhibitors  cocktail  (Roche),  20  mM 
NEM  (Sigma),  0.5  pi  Benzonase  nuclease  (250  U/ul;  Sigma);  RNase  A  (200  pg/ml; 
Sigma)  treatment  for  15’  at  4°C  prior  to  ultracentrifugation  at  300,000  g. 

Mammalian  TAPI  samples  were  prepared  as  follows:  a  cell  pellet  of  at  least  lxlO9 
cells  (~100pl  in  volume)  were  lysed  in  cold  modified  300  pi  RIPA  buffer  (1%  Triton  X- 
100,  0.1%  SDS,  1%  Sodium  deoxycholate,  150  mM  NaCl,  10  mM  Na3PC>4,  50  mM  NaF, 

5  mM  MgCE,  5mM  DTT,  5mM  Na3V04,  with  lx  protease  inhibitors  (Roche)  and  33  U 
DNase  1  (Sigma),  3  mg  RNase  A  (Sigma)  and  750  U  Benzonase  (Sigma)),  followed  with 
a  10-minute  incubation  at  room  temperature  and  then  20  minutes  of  mild  rotation  at  4°C. 
Lysates  were  then  spun  at  low  speed  (5  minutes  centrifugation  at  100  g)  and  the  pellets 
were  subjected  to  a  second  lysing  in  modified  RIPA  buffer  with  rotation  for  20  minutes  at 
4°C.  Combined  supernatants  were  run  through  a  30%  sucrose  gradient  by 
ultracentrifugation  (2  hours  at  45000  rpm  at  4°C  with  a  Beckman  SW-50A  rotor).  Some 
samples  were  analyzed  at  this  point  to  determine  the  presence  of  specific  proteins  in  the 
pellet  fraction  by  western  blotting.  After  ultracentrifugation,  the  pellet  was  re-suspended 
in  high  SDS  buffer  (lx  TBS,  5  mM  DTT,  5  mM  EDTA,  2-4%  SDS,  with  lx  protease 
inhibitors  (Roche))  and  incubated  with  gentle  mixing  for  at  least  20  minutes  at  37°C. 
Samples  were  run  at  200  volts  on  an  acrylamide  gel  (Any  kD™,  Bio-Rad)  in  10% 
glycerol  with  0.1%  bromophenol  blue  to  monitor  sample  migration.  The  top  3  millimeters 
of  the  wells  were  excised  and  frozen.  Frozen  gel  fragments  were  thawed  and  resuspended 
in  elution  buffer  (lOmM  Tris  pH  8.0,  0.4%  SDS,  5mM  DTT),  then  mixed  and  incubated 
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at  99°C  for  several  minutes.  Sample  volume  was  reduced  by  half  in  a  speed-vac  and 
applied  to  a  desalting  column  (Zeba  spin,  Thenno  Scientific),  pre-equilibrated  with 
25mm  triethylammonium  bicarbonate  (TeABC).  The  flow-through  was  analyzed  for 
elution  efficiency  by  western  blot  prior  to  digestion. 

Samples  were  digested  for  mass  spectrometry  (MS)  analysis  using  a  previously 
described  method  (238)  with  minor  modifications.  Briefly,  DTT  was  added  to  the  sample 
to  obtain  a  15  mM  solution  prior  to  addition  of  iodoacetamide  to  a  concentration  of  50 
mM  followed  by  a  20-minute  incubation  at  30°C  in  the  dark.  Next,  9M  urea  was  added  to 
the  sample,  which  was  then  filtered  (30  kDA  amcon  30K  spin  filter  -  centrifuged  at 
16000  x  g  5-10min),  and  washed  with  25mM  TeABC  twice,  and  finally  trypsin  digested 
(5- 1  Oug  trypsin)  overnight  at  room  temperature.  The  sample  was  retrieved  from  the 
column  by  centrifugation  (16000  x  g  for  4  min),  and  washed  with  25mM  TeABC  prior  to 
lyophilization.  Lyophilized  samples  were  analyzed  by  tandem  MS/MS  by  Johns  Hopkins 
Mass  Spectrometry  and  Proteomics  Facility. 

Mass  Spectrometry  Analysis 

Samples  were  run  on  Q-Exactive  (Thermo  Scientific)  or  Orbitrap  Velos  (Thenno 
Scientific)  at  70,000  resolution  for  MS  and  17,500  for  MS2,  or  30,000  resolution  for  MS 
and  15,000  for  MS2  respectively.  The  data  were  collected  in  data  dependent  mode  with 
the  top  15  precursors  chosen  for  MS/MS.  The  peptides  were  eluted  with  a  90  minute 
gradient  at  300  nanoliters  per  minute  after  trapping  and  desalting  for  5  minutes  at  5 
microliters  per  minute.  Peptides  were  fragmented  with  a  normalized  collision  energy  of 
27  for  Q-Exactive,  and  35  for  Orbitrap  Velos.  Target  values  were  3E6  ions  with  60 
millisecond  maximum  injection  time  for  MS  and  5E4  with  250  milliseconds  for  MS2  for 
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Q-Exactive  and  1E6  ions  for  MS  with  a  100  millisecond  maximum  injection  time  for  MS 
and  5e4  with  300  milliseconds  for  MS2  for  LTQ  Orbitrap  Velos. 

All  data  were  searched  using  Mascot  (v2.6  Matrix  Science)  through  Proteome 
Discoverer  (vl.4  Thenno  Scientific).  The  database  for  Yeast  included  Htt-Q25-GFP  in 
addition  to  the  RcfSeq  2014  Saccharomyces  cerevisiae  and  the  database  for  Rat  included 
GFP-Htt-Q23  in  addition  to  the  RcfSeq  2014  Rattus  norvegicus.  Variable  modifications 
included  oxidation  on  Met,  deamidation  on  N  and  Q,  and  carbamidomethylation  on 
C.  Data  were  searched  with  a  30  part  per  million  (ppm)  tolerance  for  precursor  mass  and 
0.03  daltons  for  fragment  masses.  Data  were  searched  with  and  without  the  MS2 
processor  node  which  deisotopes  the  MS2  spectra  to  the  +1  charge  state  prior  to 
searching.  Data  were  filtered  through  the  Target  Decoy  PSM  Validator. 

The  resulting  data  were  filtered  through  Scaffold  software  for  Total  Spectra  Count 
at  5%  FDR.  Criteria  for  proteins  to  be  defined  as  associated  with  Htt-Q74/103  (Htt-PolyQ 
aggregate)  were  as  follows:  2  or  more  total  spectra,  and  present  in  the  expanded  Htt- 
polyQ  aggregate  while  absent  in  the  short  Htt-polyQ  control  sample  (henceforth  termed 
binary)  in  at  least  2  of  4  samples  examined.  Thus,  the  requirement  for  a  protein  to  be 
considered  positive  is  that  in  at  least  2  of  the  matched  samples  (containing  a  pair  of  Q- 
short  and  a  Q-long  samples)  performed  at  the  same  time  on  the  same  instrument,  the 
protein  shows  >2  spectra  in  the  Q-long  and  not  the  Q-short.  This  resulted  in  52  Htt-polyQ 
aggregate  associated  proteins  identified  in  Saccharomyces  cerevisiae  (Table  1;  Appendix 
1)  and  91  Htt-polyQ  aggregate  associated  proteins  identified  in  Rattus  norvegicus  (Table 
3;  Appendix  2).  In  the  accompanying  supplementary  files,  we  also  include  the  proteins 
that  meet  a  less  stringent  threshold:  present  in  the  Htt-polyQ  aggregate  sample  while 


58 


absent  in  the  control  for  at  least  one  sample  set  (binary)  and  2-fold  greater  spectra 
number  in  Htt-Q74/103  (Htt-PolyQ  aggregate)  than  the  short  Htt-polyQ  control  in  at  least 
one  additional  sample  pair.  As  a  control  for  the  above  method  of  processing,  we  also  used 
normalized  spectral  counting  (NCS),  a  label-free  quantification  method  that  compares  the 
number  of  MS/MS  spectra  assigned  to  each  protein  normalized  for  the  total  spectral 
counting  among  samples  (150;  253).  The  NCS  processing  yielded  very  similar  data  sets 
(data  not  shown). 

Bioinformatic  Analysis 

The  proteins  identified  to  be  associated  with  polyQ  aggregates  using  TAPI  and 
MS  were  further  characterized  by  molecular  function,  Q/N  content  and  intrinsic  disorder. 
Gene  Ontology,  Saccharomyces  Genome  Database  (SGD)  and  the  Rat  Genome  Database 
(RGD)  were  used  to  determine  the  molecular  function  of  each  protein  (5;  28;  1 17).  Q/N- 
rich  regions  were  defined  as  30  or  more  Q/N  in  an  80-amino  acid  stretch  (137).  For  the 
cases  in  which  data  were  not  available,  we  developed  a  PERL-based  algorithm  to 
examine  protein  sequences  for  Q/N-rich  regions  (Appendix  3). 

Long  intrinsically-disordered  (ID)  domains  were  determined  using  the  IUPred-L 
structural  prediction  algorithm;  ID  domains  were  defined  as  30  or  more  amino  acids  with 
a  disorder  score  of  0.5  or  greater  (50).  To  approximate  the  percentage  of  proteins  in  the 
yeast  and  rat  genomes  with  long  ID  regions,  100  and  200  proteins  respectively  (~  twice 
the  size  of  each  sample  data  set;  S 1  and  Appendix  2s),  were  randomly  selected  using  a 
random  number  generator  in  alignment  with  the  full  proteomes  of  yeast  (S.  cerevisiae) 
and  Rat  ( R .  norvegicus)  downloaded  from  uniprot  (www.uniprot.org).  Domains  with 
intrinsic  disorder  were  evaluated  for  all  proteins  by  IUPred-L.  Chi-Square  2x2  Fisher’s 
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Exact  test  (Graphpad  software)  was  used  to  determine  if  proteins  with  long  ID  domains 
(>100  amino  acids)  were  significantly  enriched  in  polyQ  aggregates.  Analysis  was  also 
performed  to  ensure  that  the  TAPI  methodology  was  not  biased  towards  identifying 
proteins  that  are  abnormally  large  or  abundant.  The  size  (kDa)  and  cellular  abundance 
(molecules/cell)  was  detennined  for  each  protein  in  the  yeast  sample  set  and  compared 
with  the  whole  proteome  (values  accumulated  from  (62;  64;  122;  144);  Figure  4.). 

Western  Blotting 

Western  blotting  of  cell  lysates  (input)  and  TAPI-purified  samples  were  used  to 
verify  high  molecular  weight  protein  aggregation  (observed  as  large  species  that  cannot 
migrate  beyond  the  top  of  an  acrylamide  stacking  gel)  and  confirm  that  specific  proteins 
are  trapped  in  polyQ  aggregates.  Standard  Western  blotting  techniques  were  employed 
using  nitrocellulose  or  PVDF  membranes,  which  were  probed  with  primary  antibodies 
against  the  following  targets  at  dilutions  of -1:5000:  aGFP  (Roche),  aHA  (Santa  Cruz 
and  Sigma),  aErk  (Santa  Cruz  sc-93),  aFUS  (Bethyl),  ahnRNPAl  (Cell  Signaling), 
aRAD23B  (Protein  Tech),  aTDP43  (Protein  Tech),  and  aUBQFN2  (Novus  Bio  -  5f5). 
Appropriate  HRP-conjugated  secondary  antibodies  were  used  at  1:1000  dilutions, 
followed  by  HRP  chemiluminescent  substrate  (Pierce  ECE)  for  visualization. 

Lysate  Partitioning 

Analysis  of  proteins  (Bmhlp,  Deflp,  FUS,  Ent2p  and  Sgt2p)  entrapped  within 
aggregates  was  performed  by  observing  the  fraction  of  protein  in  the  total  lysate, 
supernatant  or  pellet  fraction  that  partitioned  to  the  stacking  well  of  an  acrylamide  gel 
under  standard  SDS  electrophoresis  conditions.  Briefly,  yeast  lysates  were  prepared  by 
mechanical  breakage  using  glass  beads  in  TAPI  Buffer  A  (with  RNase  A).  Pellet  and 
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supernatant  fractions  were  prepared  as  described  in  the  TAPI  methodology  described 
above.  Cellular  fractions  were  subjected  to  SDS-PAGE  using  Any  kD™  gels  (Bio-Rad) 
followed  by  Western  Blotting.  The  effectiveness  of  the  TAPI  buffer  to  eliminate  RNA 
from  the  aggregates  was  examined  by  RNase-Free  Agarose  gel  electrophoresis  with  and 
without  nuclease  treatment  (Figure  3A).  The  FUS  protein  is  prone  to  degradation 
following  cell  lysis,  thus  denaturing  buffer  (10  mM  Tris,  pH  7.5,  8  M  urea)  was  used  to 
visualize  protein  levels  under  conditions  in  which  degradation  is  greatly  inhibited.  This 
enables  confirmation  that  the  lysates  contain  equivalent  initial  amounts  of  FUS. 

Confocal  Microscopy 

Confocal  microscopy  slides  were  prepared  on  Poly-lysine  coated  slides  with 
lxlO6  cells/spot,  fixed  with  4%  paraformaldehyde  in  lx  PBS  and  permeablized  with 
0.1%  Triton  X-100  in  PBS.  Primary  antibodies  for  RAD23B  (Bethyl)  or  FUS  (Bethyl) 
were  added  at  1:200  dilution,  followed  by  type-specific  Alexa  Fluor  conjugated 
secondary  antibodies  (a-rabbit  647  and  a-mouse  568,  Southern  Biotech)  at  1:1000 
dilution  in  1%  fetal  calf  serum  in  lx  PBS.  Sample  slides  were  mounted  with  DAPI- 
containing  fluoramount  (EMS)  and  viewed  on  a  Zeiss  710  confocal  laser  scanning 
microscope  and  analyzed  using  Zen  software  (2009). 

Thioflavin-T  Analysis 

Thioflavin-T  (Th-T;  Sigma)  fluorescence  was  used  to  determine  if  HttQ103-GFP 
aggregates  adopted  an  amyloid-like  comformation.  Purified  Sup35-NM  fibres  (amyloid 
positive  control),  HttQ25-GFP,  and  HttQ103-GFP  samples  were  treated  with  1  ug  Th-T  in 
50mM  Tris  pH  8.0,  50mM  NaCl  buffer  in  a  black  96-well  plate.  Samples  were  analyzed 
on  a  BioTek  Synergy  HI  plate  reader  using  an  excitation  of  440  nm  and  emission  at  490 
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nm.  To  determine  if  Thioflavin-T  absorbance  was  significantly  different  between 
HttQ25-GFP  and  HttQ103-GFP,  a  two-tailed  T-test  analysis  was  used. 

Determination  of  overlapping  Htt-polyQ  associated  proteins  from  multiple  studies 

Htt-polyQ-associated  proteins  identified  by  TAPI  (231),  SILAC,  (155)  and 
antibody  isolation  with  2D  gel  analysis  (228)  were  collated  and  overlapping  proteins 
were  identified.  These  overlapping  hits  were  then  compared  to  the  four  overexpression 
and  deletion  studies  (66;  101;  132;  237).  This  cohort  was  further  analyzed  for  molecular 
function,  Q/N  content,  and  ID  domain.  Gene  Ontology  and  Saccharomyces  Genome 
Database  (SGD)  were  used  to  determine  the  molecular  function  of  each  protein  (5;  28; 
117).  Q/N-rich  regions  were  defined  as  30  or  more  Q/N  in  an  80-amino  acid  stretch 
(137).  For  the  cases  in  which  data  were  not  available,  we  developed  a  PEARL-based 
algorithm  to  examine  protein  sequences  for  Q/N-rich  regions  (Appendix  3). 

Long  intrinsically  disordered  (ID)  domains  were  determined  using  the  IUPred-L 
structural  prediction  algorithm;  ID  domains  were  defined  as  30  or  more  amino  acids  with 
a  disorder  score  of  0.5  or  greater  (50).  To  approximate  the  percentage  of  proteins  in  the 
yeast  genome  with  long  intrinsically-disordered  (ID)  domains,  100  proteins  (~  twice  the 
size  of  the  sample  data  set;  Appendix  1),  were  randomly  selected  using  a  random  number 
generator  in  alignment  with  the  full  proteome  of  yeast  ( S .  cerevisiae )  downloaded  from 
uniprot  (www.uniprot.org).  Domains  with  intrinsic  disorder  were  evaluated  for  all 
proteins  by  IUPred-L.  Chi-Square  2x2  Lisher’s  Exact  test  (Graphpad  software)  was  used 
to  determine  if  proteins  with  long  ID  domains  (>100  amino  acids)  were  significantly 
enriched  in  polyQ  aggregates. 
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APPENDICES 


Appendix  1:  Supplemental  File  SI  -  Proteins  identified  by  mass  spectrometry  following  TAPI  purification  of  polyglutamine 
aggregates  from  yeast  cells. 


Supplemental  File  S1.1:  Biophysical  Characterization  of  Htt-polyQ  aggregate-associated  proteins  identified  by  Mass  Spec  analysis  of  TAPI  samples  in  S.  cerevisiae. 
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Ser-Thr  protein  kinaiYBR059C 

gi  1 6319533 

124  kDa 

6 

2 

4 

2 

0 

8 

#DIV/0! 
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Bmhlp 

14-3-3  protein,  majoYER177W 
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Cbklp 

Serine/threonine  proYNL161W 
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Ccr4p 
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Cyc8p 

General  transcriptiorYBR112C 
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12 

#DIV/0! 

0 

Dhhlp 
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#DIV/0! 

0 
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Ent2p 
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#DIV/0! 

0 

7 
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Gal2p 

Galactose  permease  YLR081W 
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Protein  involved  in  AYGL181W 
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Hrplp 

RRM-containing  het<YOL123W 
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#DIV/0! 
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Ixrlp 

Transcriptional  repreYKL032C 

gi  1 398364635 

68  kDa 
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0 
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0 

Mbflp 

Transcriptional  coac  YOR298C-A 

gi  1398366153 

16  kDa 
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0 

Mcalp 

Ca2+-dependent  cytYOR197W 

gi  1 398365705 

48  kDa 
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3 

#DIV/0! 

0 
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#DIV/0! 

0 
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Transcriptional  repreYMR070W 
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54  kDa 

0 

2 

#DIV/0! 

0 

Mum2p 

cytoplasmic  protein;  YBR057C 

gi  16319531 

41  kDa 

0 

6 

#DIV/0! 

0 
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RNA-binding  protein  YPL190C 

gi  16325066 

90  kDa 

0 

44 

#DIV/0! 

0 

2 
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0 
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RNA  binding  protein  YHR086W 
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57  kDa 
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0 

2 
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0 

Newlp 
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gi  16325030 

134  kDa 

2 

2 

16 

8 

0 

10 

#DIV/0! 

0 

Nrdlp 

RNA-binding  subuni  YNL251C 

gi  16324078 

64  kDa 

0 

44 

#DIV/0! 

0 

NuplOOp 

FG-nucleoporin  comYKL068W 

gi  16322782 

100  kDa 

0 

4 

#DIV/0! 

0 

Nupll6p 

FG-nucleoporin  comYMR047C 

gi  16323691 

116  kDa 

2 

8 

4 

0 

10 

#DIV/0! 

0 

Nup57p 

FG-nucleoporin  comYGR119C 

gi  16321557 

57  kDa 

2 

0 

2 

#DIV/0! 

0 

Panlp 

Part  of  actin  cytoskeYIR006C 

gi  1398364573 

160  kDa 

6 

2 

8 

4 

0 

26 

#DIV/0! 

0 

Pbplp 

Component  of  gluco  YGR1 78C 

gi  1398366039 

79  kDa 

0 

7 

#DIV/0! 

0 

6 

#DIV/0! 

2 

Pgdlp 

Subunit  of  the  RNA  |  YGL025C 

gi  137362652 

43  kDa 

0 

4 

#DIV/0! 

0 

Pin3p 

Negative  regulator  OYPR154W 

gi  16325412 

24  kDa 

2 

0 

2 

#DIV/0! 

0 

Pin4p 

Contains  an  RNA  re<YBL051C 

gi  16319420 

74  kDa 

0 

2 

#DIV/0! 

0 

Pop2p 

RNase  of  the  DEDD  YNR052C 

gi  1398365813 

50  kDa 

0 

2 

#DIV/0! 

0 

Puf3p 

m RNA-binding  proteYLL013C 

gi  16323016 

98  kDa 

4 

0 

2 

#DIV/0! 

0 

9 

#DIV/0! 

0 

Scd5p 

Protein  required  for  IYOR329C 

gi  1398366267 

97  kDa 

0 

2 

#DIV/0! 

0 

Sec24p 

Component  of  the  SiYIL109C 

gi  1398364269 

104  kDa 

4 

0 

6 

#DIV/0! 

0 

Sgt2p 

Glutamine-rich  cytopYOR007C 

gi  16324580 

37  kDa 

2 

0 

32 

#DIV/0! 

0 

24 

#DIV/0! 

4 

Slalp 

Cytoskeletal  protein  YBL007C 

gi  16319464 

136  kDa 

0 

6 

#DIV/0! 

0 

13 

#DIV/0! 

0 

Slmlp 

Phosphoinositide  PI-YIL105C 

gi  16322086 

78  kDa 

2 

0 

2 

#DIV/0! 

0 

Smllp 

Ribonucleotide  redU'YML058W 

gi  16323582 

12  kDa 

0 

2 

#DIV/0! 

0 

Snf5p 

Subunit  of  the  SWI/S  YBR289W 

gi  1398365913 

103  kDa 

2 

0 

4 

#DIV/0! 

0 

Srp54p 

Interacts  with  the  SFYPR088C 

gi  16325345 

60  kDa 

6 

0 

2 

#DIV/0! 

0 

8 

#DIV/0! 

0 

Taf5p 

Subunit  (90  kDa)  of  YBR198C 

gi  16319675 

89  kDa 

2 

0 

3 

#DIV/0! 

0 

Tuplp 

General  repressor  O  YCR084C 

gi  16319926 

78  kDa 

0 

19 

#DIV/0! 

0 

Whi3p 

RNA  binding  protein  YNL197C 

gi  16324132 

71  kDa 

0 

3 

#DIV/0! 

0 

Yapl801p 

Protein  of  the  AP18CYHR161C 

gi  16321955 

72  kDa 

0 

8 

#DIV/0! 

0 

Yapl802p 

Protein  of  the  AP18CYGR241C 

gi  1398366269 

64  kDa 

0 

2 

#DIV/0! 

0 

2 

#DIV/0! 

0 

Vdjlp 

Type  1  HSP40  co-ch,YNL064C 

gi  16324265 

45  kDa 

0 

29 

#DIV/0! 

0 

12 

#DIV/0! 

0 

YGR250C 

Putative  RNAbindiniYGR250C 

gi  1398366299 

90  kDa 

2 

0 

2 

#DIV/0! 

0 

YLR177W 

Putative  protein  of  u  YLR177W 

gi  16323206 

71  kDa 

0 

25 

#DIV/0! 

0 

YMR124W 

Protein  involved  in  s  YMR124W 

gi  16323772 

106  kDa 

6 

0 

7 

#DIV/0! 

0 

4 

#DIV/0! 

0 

YNL208W 

Protein  of  unknown  1YNL208W 

gi  142742305 

20  kDa 

9 

0 

4 

#DIV/0! 

0 

1 

1  1 

1  1 
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otal  Spectra  If 

1 _ 

Q103 

Ratio  (4/14) 

%Q  Q/N-Rich 

ID  AA 

%ID 

AA 

PI 

ID  domain  lUPred-L 

ID  >50 

ID  >100 

ID  >150 

ID  >200 

RBP 

RBP  W/ID 

PQC 

18 

#DIV/0! 

8.75  655-792 

709 

64% 

1108 

7.73 

79,  620 

79,  620 

620 

620 

620 

4 

#DIV/0! 

4.92  None 

48 

9% 

528 

6.87 

42 

Yes 

51 

#DIV/0! 

6.74  None 

55 

21% 

267 

4.65 

35 

Yes 

20 

#DIV/0! 

12.57  143-302 

86 

11% 

756 

8.01 

263 

263 

263 

263 

263 

23 

#DIV/0! 

7.29  9-144 

268 

32% 

837 

6.84 

115,  59,  42 

115,  59 

115 

Yes 

Yes 

4 

#DIV/0! 

17.18  441-682 

599 

62% 

966 

4.76 

40, 172,  371 

172,  371 

172,  371 

172,  371 

371 

Yes 

Yes 

34 

#DIV/0! 

23.17  335-719 

610 

83% 

738 

4.76 

387,  41,  32,  59 

387,  59 

387 

387 

387 

Yes 

7 

#DIV/0! 

8.1  None 

117 

23% 

506 

7.97 

70 

70 

Yes 

Yes 

15 

#DIV/0! 

7.44  None 

553 

88% 

632 

10.3 

41,  85,  95,  295 

85,  95,  295 

295 

295 

295 

Yes 

Yes 

29 

#DIV/0! 

15.2  202-360 

303 

67% 

454 

5.66 

99,  42,  71,  84 

99,  71,  84 

22 

11 

24.31  214-599 

450 

73% 

613 

5.21 

81,  299 

81,  299 

299 

299 

299 

12 

#DIV/0! 

3.31  None 

41 

7% 

574 

7.21 

None 

16 

#DIV/0! 

12.12  278-359 

116 

29% 

396 

7.46 

43,  39 

6 

#DIV/0! 

8.99  None 

288 

54% 

534 

5.36 

156,  54, 148 

156,  54,  148 

156, 148 

156 

Yes 

Yes 

12 

#DIV/0! 

20.77  1-383 

502 

84% 

597 

8.45 

158,  199, 106 

158, 199, 106 

158, 199,  106 

158, 199 

Yes 

Yes 

2 

#DIV/0! 

4.64  None 

171 

113% 

151 

11.35 

106 

106 

106 

Yes 

6 

#DIV/0! 

11.34  15-108 

65 

15% 

432 

5.09 

128 

128 

128 

Yes 

2 

#DIV/0! 

26.92  107-239 

243 

85% 

286 

4.86 

207 

207 

207 

207 

207 

Yes 

Yes 

6 

#DIV/0! 

9.59  1-90,  70-204,  133-227 

143 

29% 

490 

9.2 

169, 133,  50 

169, 133 

169, 133 

169 

Yes 

Yes 

12 

#DIV/0! 

4.92  None 

200 

55% 

366 

5.84 

97,  40 

97 

27 

#DIV/0! 

10.35  646-779 

609 

76% 

802 

4.22 

217,  83,  256 

217,  83,  256 

217,  256 

217,  256 

217,  256 

Yes 

Yes 

10 

#DIV/0! 

7.07  None 

83 

16% 

523 

8.45 

None 

Yes 

48 

#DIV/0! 

4.26  15-129 

199 

17% 

1196 

5.67 

32,  34 

Yes 

Yes 

8 

#DIV/0! 

7.48  None 

169 

29% 

575 

9.96 

103, 108 

103, 108 

103, 108 

Yes 

Yes 

2 

#DIV/0! 

6.36  258-393 

741 

77% 

959 

10.15 

52,  120,  377,  46,  34,  50 

52, 120,  377 

120,  377 

377 

377 

18 

#DIV/0! 

9.88  264-345,  385-467,  394 

389 

35% 

1113 

10.13 

96,  197,  392,  86 

96,  197,  392,  86 

197,  392 

197,  392 

392 

16 

#DIV/0! 

11.09  207-291 

323 

60% 

541 

10.39 

260 

260 

260 

260 

260 

31 

#DIV/0! 

10.2  1-120 

1097 

74% 

1480 

5.02 

231,  75,  51,  37,  36,  509 

231,  75,  51,  509 

231,  509 

231,  509 

231,  509 

16 

8 

3.19  None 

473 

66% 

722 

6.91 

43,  45,  90,  88,  30,  42 

90,  88 

Yes 

Yes 

2 

#DIV/0! 

10.33  272-359 

274 

69% 

397 

9.13 

256 

256 

256 

256 

256 

6 

#DIV/0! 

13.49  None 

126 

59% 

215 

7.55 

88 

88 

8 

#DIV/0! 

10.03  415-508 

465 

70% 

668 

6.21 

78,  93,  32, 161,  34 

78,  93, 161 

161 

161 

Yes 

Yes 

7 

#DIV/0! 

14.55  11-162 

123 

28% 

433 

6.11 

82,  38 

82 

Yes 

Yes 

24 

#DIV/0! 

7.74  379-480 

383 

44% 

879 

7.21 

146,  198 

146, 198 

146, 198 

198 

Yes 

Yes 

6 

#DIV/0! 

11.93  598-695,  734-839 

307 

35% 

872 

7.94 

53,  45,  36,  476 

53,  476 

476 

476 

476 

34 

#DIV/0! 

6.26  None 

40 

4% 

926 

6.16 

134 

134 

134 

Yes 

65 

16 

4.62  None 

164 

47% 

346 

4.51 

48,81 

81 

Yes 

15 

#DIV/0! 

6.67  None 

812 

65% 

1244 

5.69 

124,  71,  40,  40, 143,  34,  84,  77 

124,  71, 143,  84,  77 

124, 143 

Yes 

8 

#DIV/0! 

11.08  1-129 

271 

40% 

686 

8.16 

117,  81 

117,  81 

117 

6 

#DIV/0! 

8.65  None 

76 

73% 

104 

4.55 

None 

8 

#DIV/0! 

16.69  1-95,  113-330 

460 

51% 

905 

8.49 

131,  31,  54,  46, 110 

131,  54,  110 

131, 110 

Yes 

Yes 

42 

#DIV/0! 

7.39  None 

147 

27% 

541 

9.04 

71 

71 

Yes 

Yes 

11 

#DIV/0! 

4.39  None 

256 

32% 

798 

7.39 

134,  56 

134,  56 

134 

Yes 

Yes 

Yes 

4 

#DIV/0! 

8.56  None 

326 

46% 

713 

5.47 

192,  36 

192 

192 

192 

Yes 

Yes 

Yes 

14 

#DIV/0! 

7.41  None 

347 

52% 

661 

8.53 

42,  125, 146,  32,  35,  41 

125, 146 

125, 146 

Yes 

Yes 

2 

#DIV/0! 

11.46  484-595 

307 

48% 

637 

7.04 

73,  200 

73,  200 

200 

200 

200 

8 

#DIV/0! 

12.15  None 

293 

52% 

568 

6.77 

50,  91,  86 

91,  86 

22 

#DIV/0! 

3.42  None 

162 

40% 

409 

6.23 

None 

Yes 

16 

#DIV/0! 

5.38  None 

233 

30% 

781 

5.1 

41,  32 

Yes 

Yes 

4 

#DIV/0! 

7.17  125-224 

191 

30% 

628 

9.38 

59,  63 

59,  63 

16 

#DIV/0! 

8.27  17-139,  301-383 

515 

55% 

943 

6.88 

143,  37,  345,  70,  45,  33,  32 

143,  345,70 

143,  345 

345 

345 

36 

#DIV/0! 

11.06  None 

168 

84% 

199 

7.9 

53,  124 

53,  124 

124 

1 

8.62  54% 

48% 

686 

7.28 

48/52  =  92% 

43/52  =  83% 

33/52  =  63% 

22/52  =  42% 

16/52  =  31% 

46% 

96% 

19% 
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Appendix  2:  Supplemental  File  S2  -  Proteins  identified  by  mass  spectrometry  following  TAPI  purification  of  polyglutamine 
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Supplemental  File  S2.1 :  Biophysical  Characterization  of  Htt-polyQ  aggregate-associated  proteins  identified  by  Mass  Spec  analysis  of  TAPI  samples  in  R.  norvegicus. 
Note:  Rows  highlighted  in  blue  are  disease-linked  proteins. Total  Sp  J Total  Sp  Total  Spectra  [Total  Sp  Total  Spectra  [Total  Sp  Total  Spectra  1 


Identified  Pr Protein  Function  NCBI  Ref  Seq 

Accession  # 

Mol  Weight 

Q74  (Ap 

Q23  Q74 

Ratio  (Ju 

23 Q  74Q 

Ratio  (A 

23Q 

74Q 

Ratio  (Ju 

|Q/N-rich 

ID  Domain  >30 

IDD  >50 

IDD  >100 

IDD  >150 

IDD  >200 

RBP 

RBP  W/ID 

PQC 

Aakl 

AP2-associated  protein  kinase  1  |NP  001100361.1 

gi|291045113  (+2) 

104  kDa 

0 

3 

#DIV/0! 

0 

4  #DIV/0! 

399-620 

209,81,  50 

209,  81,  50 

209 

209 

209 

Aars2 

alanine--tRNA  ligase,  mitochondriNP  861433.2 

gij  15781 8067  (+2) 

108  kDa 

0 

2  #DI V/0! 

0 

14 

#DIV/0! 

None 

None 

yes 

Acad9 

acyl-CoA  dehydrogenase  family  rNP  001030123.1 

gi|1 9731 3734  (+1) 

69  kDa 

0 

6  #DIV/0! 

0 

22 

#DIV/0! 

None 

None 

Acsf2 

acyl-CoA  synthetase  family  memlNP  113896.1 

gi|77993368 

68  kDa 

0 

4  #DI V/0! 

0 

4 

#DIV/0! 

None 

None 

Adrml 

proteasomal  ubiquitin  receptor  ACXP  006241116.1 

gi|1 3928990  (+1) 

42  kDa 

0 

9 

#DIV/0! 

2 

6  3 

0 

6 

SDIV/0! 

None 

39,  36 

yes 

Akap8l 

PREDICTED:  A  kinase  (PRKA)  aiXP  006241116.1 

gi|564358445  (+2) 

68  kDa 

0 

2 

#DIV/0! 

0 

2  #DI V/0! 

None 

119,  100 

119,  100 

119,  100 

yes 

yes 

Arl6ip4 

ADP-ribosylation  factor-like  proteiNP  001020801.1 

gi|71 043808 

26  kDa 

0 

4 

#DIV/0! 

0 

2  #DI V/0! 

None 

199 

199 

199 

199 

yes 

yes 

Asapl 

PREDICTED:  arf-GAP  with  SH3  (NP  075581.1 

gi|564307456  (+2) 

109  kDa 

5 

0 

2 

#DI  V/0! 

0 

4  #DI V/0! 

None 

364 

364 

364 

364 

364 

Atp5a1 

ATP  synthase  subunit  alpha,  mitoXP  008756735.1 

gi|40538742 

60  kDa 

0 

2 

#DIV/0! 

0 

2  #DI V/0! 

4 

20 

5.00 

None 

None 

yes 

Atrx 

PREDICTED:  transcriptional  reguXP  008756735.1 

gi|564322975  (+10) 

279  kDa 

0 

13 

#DIV/0! 

0 

5 

#DIV/0! 

None 

126 

126 

126 

yes 

Clintl 

PREDICTED:  clathrin  interactor  1  NP  001002022.1 

gi|564371473 

68  kDa 

0 

60 

#DI  V/0! 

0 

18  #DI V/0! 

0 

91 

SDIV/0! 

None 

124,  35,  46 

124 

124 

Cite 

PREDICTED:  clathrin  heavy  chaiiXP  006247169.1 

gi|564373736  (+3) 

192  kDa 

0 

5 

#DIV/0! 

0 

11 

#DIV/0! 

None 

None 

yes 

Cnn2 

PREDICTED:  calponin-2-like  [RalXP  006241100.1 

gi|564358403 

29  kDa 

0 

2 

#DI  V/0! 

0 

3  #DI V/0! 

None 

None 

Ddi2 

PREDICTED:  protein  DDI1  homoXP  006239367.1 

gi|564353874  (+3) 

106  kDa 

0 

4 

#DIV/0! 

0 

3  #DI V/0! 

0 

5 

#DIV/0! 

None 

38 

yes 

Ddx5 

probable  ATP-dependent  RNAheNP  001007614.1 

gi|56090441 

69  kDa 

2 

0 

3 

#DIV/0! 

0 

4 

#DIV/0! 

None 

36,  36 

yes 

yes 

Dnaja2 

dnaJ  homolog  subfamily  A  memb  NP  114468.2 

gi|56799412 

46  kDa 

0 

24 

#DI  V/0! 

0 

10  #DIV/0! 

0 

18 

#DIV/0! 

None 

62 

62 

yes 

Dnaja4 

dnaJ  homolog  subfamily  A  memb  NP  001020582.1 

gi|70794764 

62  kDa 

0 

4 

#DIV/0! 

0 

2  #DI V/0! 

None 

209 

209 

209 

209 

209 

yes 

Dnajbl 

dnaJ  homolog  subfamily  B  membNP  001101911.1 

gij  1578231 65  (+1) 

42  kDa 

0 

3 

#DIV/0! 

0 

4  #DIV/0! 

None 

30 

yes 

Dnajc7 

dnaJ  homolog  subfamily  C  membNP  998790.1 

gi|47155561  (+3) 

57  kDa 

0 

2 

#DI  V/0! 

0 

2  #DI V/0! 

0 

5 

#DIV/0! 

None 

None 

yes 

Dynclhl 

cytoplasmic  dynein  1  heavy  chair  NP  062099.3 

gijl48491097 

532  kDa 

0 

8 

#DIV/0! 

0 

84 

#DIV/0! 

None 

None 

yes 

Eif4g1 

PREDICTED:  eukaryotic  translati  XP  006248643.1 

gi|564314679  (+11) 

176  kDa 

0 

2 

#DI  V/0! 

0 

3 

SDIV/0! 

None 

91,356,  75,  83,  30,247 

91,356,  75,  83,247 

356, 247 

356,  247 

356, 247 

yes 

yes 

Eif4g2 

eukaryotic  translation  initiation  facNP  001017374.2 

gijl  53267461 

102  kDa 

0 

11 

#DIV/0! 

0 

3  #DI V/0! 

0 

4 

#DIV/0! 

None 

35,54,  105 

54,  105 

105 

yes 

yes 

Ep300 

PREDICTED:  histone  acetyltrans  XP  001076610.1 

gi|  109482524  (+1) 

264  kDa 

0 

20 

#DIV/0! 

0 

2  #DI V/0! 

0 

7 

#DIV/0! 

None 

230,  58,  38,  48,  50,  413,  65,  257 

230,  58,  50,  413,  65,  257 

230,  413,  257 

230,  413,  257 

230,  413,  257 

yes 

Etfb 

electron  transfer  flavoprotein  subi  NP  001004220.1 

gi|51 948412 

28  kDa 

0 

2 

#DI  V/0! 

0 

2  #DI V/0! 

9 

40 

4.44 

None 

None 

Fasn 

fatty  acid  synthase  [Rattus  norvecNP_059028.1 

gi|8394158 

273  kDa 

0 

3  #DIV/0! 

0 

12 

#DIV/0! 

None 

None 

yes 

Fus 

PREDICTED:  RNA-binding  proteiXP  006230352.1 

gi|564331 043  (+1) 

53  kDa 

7 

0 

17 

#DIV/0! 

0 

7  #DIV/0! 

0 

23 

#DIV/0! 

None 

284,  71,  87 

284,  71,  87 

284 

284 

284 

yes 

yes 

Gigyf2 

PREDICTED:  PERQ  amino  acid-iXP  003754591.1 

gi|392342451  (+3) 

151  kDa 

1  1 

28 

28 

0 

5  #DI V/0! 

0 

5 

#DIV/0! 

1947-1298 

50,  33,  259,  52,  195,  38,  39 

50,  259,  52,  195 

259,  195 

259,  195 

259 

yes 

yes 

GIs 

glutaminase  kidney  isoform,  mitoiNP  036701.2 

gijl 58303294  (+1) 

74  kDa 

0 

8  #DI V/0! 

0 

8 

#DIV/0! 

None 

30,74 

74 

Gnaol 

guanine  nucleotide-binding  proteiNP  059023.1 

gij8394152 

40  kDa 

0 

2  #DIV/0! 

0 

12 

#DIV/0! 

None 

None 

Hadha 

trifunctional  enzyme  subunit  alph;NP_570839.2 

gijl  48747393 

83  kDa 

1  0 

4 

#DIV/0! 

0 

4  #DIV/0! 

12 

36 

3.00 

|None 

None 

Hnrnpa3 

heterogeneous  nuclear  ribonudecNP  001104764.1 

gi|  162329577  (+3) 

40  kDa 

0 

4  #DIV/0! 

0 

10 

#DIV/0! 

39 

yes 

yes 

Hnrnpf 

heterogeneous  nuclear  ribonudeiNP  071792.1 

gi|25742579  (+4) 

46  kDa 

2| 

1  0 

13 

#DIV/0! 

0 

2  #DIV/0! 

0 

13 

SDIV/0! 

iNone 

None 

yes 

Hnrnph2 

PREDICTED:  heterogeneous  nucXP  006257281.1 

gi|564399943  (+3) 

49  kDa 

0 

7 

#DIV/0! 

0 

27 

#DIV/0! 

None 

None 

yes 

Hnmpm 

heterogeneous  nuclear  ribonudeiNP  446328.2 

gi|1 581 86696  (+1) 

74  kDa 

0 

3 

#DIV/0! 

0 

4 

#DIV/0! 

None 

58,  30,  31 

58 

yes 

yes 

Hnrnpu 

heterogeneous  nuclear  ribonudeiNP  476480.2 

gij  148747541  (+2) 

88  kDa 

0 

8 

#DIV/0! 

0 

6 

#DIV/0! 

None 

None 

yes 

Hsp90aa1 

heat  shock  protein  HSP  90-alpha  NP_786937.1 

gi|28467005 

85  kDa 

0 

10 

#DIV/0! 

0 

37 

#DIV/0! 

(None 

55,  30 

55 

yes 

yes 

yes 

Hspa8 

heat  shock  cognate  71  kDa  prate  NP  077327.1 

gi|  13242237 

71  kDa 

6 

0 

31 

#DIV/0! 

4 

11  2.75 

0 

20 

#DIV/0! 

None 

46,  36 

yes 

Idh2 

isocitrate  dehydrogenase  [NADP]NP  001014183.1 

gij62079055 

51  kDa 

0 

2 

#DIV/0! 

0 

4  #DI V/0! 

14 

32 

2.29 

INone 

None 

Idh3b 

isocitrate  dehydrogenase  [NAD]  sNP  446033.1 

gi|55926203  (+2) 

42  kDa 

0 

2 

#DIV/0! 

0 

14 

#DIV/0! 

INone 

None 

Kprp 

keratinocyte  proline-rich  protein  [INP  001002290.1 

gij50511334(+1) 

76  kDa 

6 

0 

12 

#DIV/0! 

0 

4  #DI V/0! 

None 

None 

Lap3 

cytosol  aminopeptidase  [Rattus  nNP  001011910.1 

gi|58865398 

56  kDa 

0 

2  #DI V/0! 

0 

8 

#DIV/0! 

INone 

None 

yes 

Ldha 

PREDICTED:  L-lactate  dehydrog.NP  058721.1 

gij392341842  (+3) 

37  kDa 

0 

2 

#DIV/0! 

0 

2  #DIV/0! 

INone 

None 

Magedl 

melanoma-associated  antigen  D1NP_445861.1 

gijl6758144 

86  kDa 

o 

12 

#DIV/0! 

o 

2  #DIV/0! 

(None 

354,  40,  30 

354 

354 

354 

354 

Matr3 

PREDICTED:  matrin-3  isoform  X  XP  006254596.1 

gi|564392970  (+2) 

95  kDa 

0 

10 

#DIV/0! 

0 

6 

#DIV/0! 

31,57,42,  119,72 

57,  119,  72 

119 

yes 

yes 

Mlf2 

myeloid  leukemia  factor  2  [RattusNP_001101359.1 

gi|  157823873  (+2) 

28  kDa 

2 

2 

6 

3 

0 

4  #DIV/0! 

0 

4 

#DIV/0! 

None 

136 

136 

136 

Myold 

unconventional  myosin-ld  [RattusNP  037115.2 

gij56799396 

116  kDa 

1  0 

2 

#DIV/0! 

1 

0 

6 

#DIV/0! 

INone 

None 

Ndufs7 

PREDICTED:  NADH  dehydrogen  NP  001008525.1 

gi|564358240  (+2) 

25  kDa 

1  0 

2  #DI V/0! 

0 

5 

#DIV/0! 

|None 

None 

Nnnn 

PREDICTED:  non-POU  domain-cNP  001012356.1 

gij564399550  (+1) 

55  kDa 

0 

8 

#DIV/0! 

0 

6 

#DIV/0! 

None 

66 

66 

yns 

yes 

Nr3c1 

glucocorticoid  receptor  [Rattus  ncNP  036708.2 

gijl  58303300 

87  kDa 

0 

4 

#DIV/0! 

0 

2  #DIV/0l 

None 

146 

146 

146 

Nsfllc 

NSFL1  cofactor  p47  [Rattus  norviNP  114187.1 

gi|1401 0837  (+1) 

41  kDa 

0 

20 

#DI  V/0! 

0 

2  #DI V/0! 

None 

76,  45 

76 

Ogdh 

PREDICTED:  2-oxoglutarate  deh  XP  006251 508.1 

gij564384801  (+3) 

116  kDa 

0 

9 

#DIV/0! 

0 

4  #DI V/0! 

15 

71 

4.73 

None 

None 

Pcbpl 

PREDICTED:  poly(rC)-binding  pnXP  008774006.1 

gi|564304213  (+1) 

41  kDa 

2 

0 

14 

#DIV/0! 

0 

8  #DIV/0! 

0 

6 

#DIV/0! 

None 

None 

yes 

Pck2 

phosphoenolpyruvate  carboxykimNP  001101847.2 

gij  1891 63483 

71  kDa 

0 

2 

#DIV/0! 

0 

2  #DIV/0! 

10 

55 

5.50 

None 

None 

Pdhb 

pyruvate  dehydrogenase  El  com  NP  001007621.1 

gi|56090293  (+1) 

39  kDa 

0 

2  #DIV/0l 

0 

29 

#DIV/0! 

None 

None 

Phgdh 

D-3-phosphoglycerate  dehydrogeNP  113808.1 

gi|1 3928850  (+1) 

56  kDa 

0 

2 

#DIV/0! 

0 

17 

#DIV/0! 

None 

None 

Plekhb2 

pleckstrin  homology  domain-contiNP  001100369.1 

gi|1 57823297 

25  kDa 

0 

3 

#DIV/0! 

0 

4  #DI V/0! 

None 

None 

Ppia 

PREDICTED:  peptidyl-prolyl  cis-tiNP_058797.1 

gi|564303360  (+3) 

18  kDa 

0 

4 

#DIV/0! 

0 

10 

#DIV/0! 

None 

None 

yes 

yes 

Ppp2r1a 

serine/threonine-protein  phospha  NP  476481.1 

gi|55926139 

65  kDa 

0 

2  #DI V/0! 

0 

8 

SDIV/0! 

None 

Prpf40a 

pre-mRNA-processing  factor  40  fNP  001099950.1 

gijl57817077 

108  kDa 

4 

0 

18 

#DIV/0! 

0 

4  #DI V/0! 

11 

24 

2.18 

None 

52,  56,  34,  37,  105,162 

52,  56,  105,  162 

105,  162 

162 

yes 

yes 

Prrc2b 

PREDICTED:  protein  PRRC2B  isXP  006224407.1 

gij564301352  (+3) 

244  kDa 

0 

22 

#DIV/0! 

0 

5  #DI V/0! 

0 

4 

#DIV/0! 

582-715 

191,  123,  194,  31,  157,  37,  38,  166 

191, 123,  194,  157, 166 

191,  123,  194,  157,  166 

191,  194,  157,  166 

yes 

yes 

Psmb2 

proteasome  subunit  beta  type-2  [INP  058980.1 

gi|8394079 

23  kDa 

0 

2 

#DI  V/0! 

0 

2  #DI V/0! 

None 

None 

yes 

Psmcl 

26S  protease  regulatory  subunit 'NP  476464.1 

gijl  6923972 

49  kDa 

2 

0 

15 

#DIV/0! 

0 

4  #DI V/0! 

0 

10 

#DIV/0! 

None 

51 

51 

yes 

yes 

Psmc2 

26S  protease  regulatory  subunit  iNP  150239.1 

gijl5100181 

49  kDa 

0 

2 

#DIV/0! 

0 

3 

#DIV/0! 

None 

None 

yes 

Psmc3 

26S  protease  regulatory  subunit  CNP  113783.1 

gijl  3928808 

50  kDa 

0 

3 

#DI  V/0! 

0 

4 

#DIV/0! 

None 

None 

yes 

RablO 

ras-related  protein  Rab-10  [Rattu:NP_059055.2 

gi|6 1889071 

23  kDa 

0 

2 

#DIV/0! 

0 

2  #DIV/0! 

None 

None 

Rad23b 

UV  excision  repair  protein  RAD2cNP  001020446.1 

gi|70778952 

43  kDa 

5 

0 

3 

#DI  V/0! 

0 

10 

#DIV/0! 

None 

120,  67,41 

120,  67 

120 

yes 

Rbmsl 

PREDICTED:  RNA-binding  motif,  XP  006234300.1 

gij564341064  (+3) 

44  kDa 

0 

2 

#DIV/0! 

0 

2  #DI V/0! 

INone 

32 

yes 

yes 

Scyl2 

SCYI-like  protein  2  [Rattus  norveNP  001178709.1 

gi|300796288  (+2) 

103  kDa 

0 

14 

#DIV/0! 

0 

2  #DI V/0! 

INone 

82,  141,  32 

82,  141 

141 

Sfl 

splicing  factor  1  isoform  2  [RattusNP  478117.2 

gij  160707952 

60  kDa 

0 

2 

#DI  V/0! 

0 

2  #DIV/0! 

INone 

53,  44,  32,  33,  32,  260 

53,  260 

260 

260 

260 

yes 

yes 

Sfn 

PREDICTED:  14-3-3  protein  sigrrXP  001065560.2 

gij392340731  (+2) 

28  kDa 

0 

6 

#DIV/0! 

0 

4 

#DIV/0! 

INone 

None 

Sgta 

PREDICTED:  small  glutamine-ricXP  006241058.1 

gij564358292  (+1) 

41  kDa 

0 

21 

#DIV/0! 

0 

8  #DI V/0! 

None 

40,  71 

71 

yes 

Sik3 

serine/theonine-protein  kinase  SIINP_001258145.2  gi|403310703 

146  kDa 

0 

2 

#DIV/0! 

o 

5 

#DIV/0! 

(None 

162,55,  65,40,214,  103,  38 

162,  55,  65,214,  103 

162,  214,  103 

162,  214 

214 

Sqstml 

sequestosome-1  isoform  1  [Rattu  NP  787037.2 

None 

yes 

Suclg2 

succinyl-CoA  ligase  [GDP-formin<NP  001094220.1 

gijl  89491689 

47  kDa 

0 

2  #DIV/0! 

0 

10 

#DIV/0! 

None 

None 

Sumo2 

small  ubiquitin-related  modifier  2  NP  598278.1 

gij  19424298  (+1) 

11  kDa 

2 

0 

10 

#DIV/0! 

0 

8 

#DIV/0! 

None 

None 

yes 

Tardbp 

PREDICTED:  TAR  DNA  binding  fXP  006239444.1 

gij564354058  (+1) 

45  kDa 

4 

0 

12 

#DIV/0! 

0 

6  #DIV/0! 

0 

3 

#DIV/0! 

None 

45,  55 

55 

yes 

yes 

Tcergl 

transcription  elongation  regulator  NP  001100860.2 

gijl 66091476  (+3) 

122  kDa 

2 

0 

6 

#DIV/0! 

0 

9  #DIV/0! 

155-248 

138,  177,  70,  36 

138,  177,  70 

138,  177 

177 

yes 

yes 

Tcf20 

transcription  factor  20  [Rattus  norNP  001124046.1 

gijl 94474014  (+3) 

214  kDa 

0 

8 

#DIV/0! 

0 

2  #DIV/0! 

109-260,  26 

87,  739,  50,32,  709,34,121 

87,  739,  50,  709,  121 

739,  709,  121 

739,  709 

739,  709 

yes 

yes 

Tfg 

PREDICTED:  protein  TFG-like  ist  NP  620200.2 

gij564376567  (+4) 

43  kDa 

0 

20 

#DIV/0! 

0 

2  #DIV/0! 

0 

9 

#DIV/0! 

238-317 

35,  54,  105 

54,  105 

105 

Tgm3 

protein-glutamine  gamma-glutam  NP  001102429.1 

gi|  157822549 

77  kDa 

2 

0 

3 

#DIV/0! 

0 

4  #DIV/0! 

None 

49 

Thyl 

PREDICTED:  thy-1  membrane  gl  NP  036805.1 

gij564363286  (+1) 

25  kDa 

1  0 

2 

#DIV/0! 

1  0 

8  #DI V/0! 

1 

INone 

None 

Tnrc6b 

trinucleotide  repeat-containing  geNP  620200.2 

gi|1 98041672  (+4) 

193  kDa 

0 

16 

#DIV/0! 

0 

4  #DIV/0! 

1214-1357 

34 

yes 

yes 

Tufm 

elongation  factor  Tu,  mitochondriiNP_001099765.1 

gijl 57820845  (+1) 

50  kDa 

0 

2 

#DIV/0! 

0 

2  #DIV/0l 

1  34 

141 

4.15 

|None 

None 

yes 

Ubqln2 

ubiquilin-2  [Rattus  norvegicus]  NP  001101721.1 

gijl57818175 

67  kDa 

5 

13.5 

#DIV/0! 

None 

yes 

Ubqln4 

ubiquilin-4  [Rattus  norvegicus]  NP  001101158.1 

gi|  1 578187 1 

64  kDa 

0 

15 

#DIV/0! 

0 

5  #DIV/0! 

INone 

71,33,  68,  79,  58 

71,68,  79,  58 

yes 

Ubxn7l1  (LO(  PREDICTED:  UBX  domain-contaXP  003751122.2 

gij564377181 

43  kDa 

0 

5 

#DIV/0! 

0 

4 

#DIV/0! 

INone 

30,  115 

115 

115 

yes 

Usp7 

PREDICTED:  ubiquitin  carboxyt-t  XP  006245818.1 

gij564370479  (+4) 

133  kDa 

0 

2 

#DIV/0! 

0 

2  #DI V/0! 

INone 

38 

yes 

Vps35 

vacuolar  protein  sorting-associate  NP  001099188.2  gi|205360969 

92  kDa 

0 

4 

#DIV/0! 

0 

8 

SDIV/0! 

None 

None 

yes 

Xrn2 

PREDICTED:  5'-3’ exoribonuclea  XP  006235218.1 

gi|564343365  (+2) 

109  kDa 

4 

2 

25 

12.5 

0 

2  #DI V/0! 

0 

24 

#DIV/0! 

INone 

38,  31,  50 

yes 

yes 

Ythdfl 

PREDICTED:  YTH  domain  family  NP  001019927.1 

gij564344895  (+1) 

62  kDa 

o 

15 

#DIV/0! 

o 

6  #DIV/0! 

0 

4 

#DIV/0! 

(None 

52,  157 

52,  157 

157 

157 

yes 

yes 

Ywhab 

14-3-3  protein  beta/alpha  [Rattus  NP 062250.1 

gij9507243 

28  kDa 

0 

4 

#DIV/0! 

0 

16 

#DIV/0! 

None 

None 

1 

1 

1 

8% 

55% 

43% 

31% 

19% 

12% 

41% 

65% 

40% 
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Appendix  3:  Supplemental  File  S3  -  PEARL-based  algorithm  for  examining  protein  sequences  for  Q/N-rich  regions. 


QSEARCH(  1 )  User  Contributed  Perl  Documentation  QSEARCH(  1 ) 

NAME 

Qsearch  -  Amino  acid  sequence  linear-search  program 

SYNOPSIS 

qsearch  [options]  filename 

DESCRIPTION 

A  program  to  analyze  Q  occurance  in  protein  sequences  contained  in  the  FASTA  file  named  filename.  Valid 
amino  acid  sequences  are  processed  one  at  a  time  starting  at  the  top  of  the  FASTA  file. 

Each  sequence  is  broken  up  into  a  moving  sub-domain  containing  the  number  of  amino  acids  specified  by 
the  — length  option.  Domains  are  created  sequentially  starting  with  the  first  amino  acid  in  the  sequence 
and  moving  linearly  in  a  manner  analogous  to  a  moving  average.  Domains  with  a  smaller  than  requested 
length  at  the  end  of  the  sequence  are  not  analyzed. 

OPTIONS 

Command  line  options  are  processed  from  left  to  right  and  may  be  specified  more  than  once.  If  conflicting 
options  are  specified,  later  options  override  earlier  ones. 

SHORT  FORM 

-a  letters 
—a letters 

Equivalent  to  — aminoacids  letters 

— f  num 
— f num 

Equivalent  to  — frequency  num 
-h  Equivalent  to  — help  and  can  be  bundled. 

—v  [num] 

— \[num ] 

Equivalent  to  — verbose  [num] 

LONG  FORM 

— aminoacids  letters 

Set  the  list  of  amino  acid  one-letter  abbreviations.  This  list  is  used  in  the  regular  expression  to  parse 
each  sequence  in  the  FASTA  file.  If  this  list  does  not  include  all  possible  amino  acid  one-letter  codes 
contained  in  the  file,  the  parsing  will  be  inaccuarate. 

The  default  value  of  letters  is  ABCDEFGHIJKLMNOPQRSTUVWXYZ. 

— debug  [num] 

— verbose  [num] 

Set  the  debug  print  level  to  num.  The  default  value  of  num  is  one. 

— frequency  num 

Set  the  frequency  threshold  for  the  occurence  of  desired  amino  acids.  The  default  value  of  num  is  30. 

— help 

Print  full  qsearch  documentation  via  perldoc  then  exit. 

— length  num 

Set  the  amino  acid  domain  length.  The  default  value  of  num  is  80. 

— version 

Report  version  number  then  exit. 

NOTES 

The  FASTA  file  must  be  free  of  blank  lines  and  have  proper  newline  characters  for  the  operating  system  this 
program  is  running  upon.  These  conversions  can  be  achieved  with  prgrams  such  as  dos2unix,  unix2dos  or 
PERL. 

This  program  relies  on  a  regular  expression  to  extract  a  sequence  from  each  FASTA  entry.  The  regular 
expression  was  built  to  work  with  the  FASTA  files  associated  with  the  corresponding  research,  rather  than 


an  extensive  test  set  of  FASTA  files.  Consequently  said  regular  expression  might  require  modification  to 
read  your  FASTA  file. 

Regular  expression  manipulation  is  nontrivial.  Seek  the  assistance  of  an  expert  before  modifying  the 
qsearch.pl  file. 

OUTPUT 

For  each  valid  sequence  in  the  FASTA  file,  the  following  items  are  returned  as  output. 

•  FASTA  Entry 

The  unwrapped,  valid  FASTA  entry  currently  being  processed 

•  Protein:  sequence 

The  extracted  sequence  from  the  FASTA  entry 

•  Number  of  Amino  Acids:  integer 

Total  number  of  amino  acids  in  the  extracted  sequence 

•  Q  total:  integer 

Total  number  of  glutamines  in  the  sequence 

•  Percent  Q:  decimal  % 

Percentage  of  glutamines  in  the  sequence  as  computed  by  (Q  total)/( Amino  acid  total) 

•  Z  total:  integer 

Total  number  of  glutamine  or  glutamic  acids  in  the  sequence 

•  Percent  QZ:  decimal  % 

Percentage  of  glutamines  and  glutamic  acids  in  the  sequence  as  computed  by  [(Q  total)  +  (Z 
total)] /(Amino  acid  total) 

For  each  domain  in  the  sequence  containing  a  number  of  Q,  Z,  N,  and  B  amino  acids  greater  than  or  equal 
to  the  value  specified  with  the  — frequency  option,  the  following  items  are  also  returned  as  output. 

•  length  Amino  Acid  Regions 

Header  noting  the  start  of  domain  analysis,  length  is  the  value  set  with  the  — length  option. 

•  first  -  last 

Integer  index  values  indicating  the  begining  and  ending  amino  acids  from  the  sequence  defining  the 
current  domain 

•  domain 

Amino  acid  sequence  for  the  current  domain 

•  Q  total:  integer 

Total  number  of  glutamines  in  the  domain 

•  Z  total:  integer 

Total  number  of  glutamine  or  glutamic  acids  in  the  domain 

•  N  total:  integer 

Total  number  of  asparagines  in  the  domain 

•  B  total:  integer 

Total  number  of  asparagine  or  aspartic  acids  in  the  domain 

•  QZNB  total:  integer 

Total  number  of  Q,  Z,  N  or  B  amino  acids  in  the  domain 
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EXAMPLES 

These  examples  assume  that  qsearch  has  been  aliased  to  qsearchpl.  Since  qsearch  reports  output  to  the 
screen,  it  may  be  helpful  to  pipe  output  into  a  new  file, 
qsearch  rat  >  rat. out 

Search  the  sequences  contained  in  the  file  rat  returning  the  usual  information  and  piping  it  to  a  new  file 
called  rat. out. 

qsearch  -f  10  file.fasta 

Search  the  sequences  contained  in  file.fasta  reporting  the  usual  output  information  for  all  domains 
containing  at  least  10  occurances  of  Q,  Z,  N  or  B. 
qsearch  -1  50  fasta.txt 

Search  the  sequences  contained  in  fasta.txt  reporting  the  usual  output  information  for  domains 
containing  50  amino  acids, 
qsearch  -h 

Print  the  full  qsearch  documentation  and  then  exit, 
qsearch  — version 

Print  the  software  version  number  for  qsearch  and  then  exit. 

INSTALLATION 

To  install  qsearch,  just  place  qsearch.pl  in  a  folder  and  make  it  executable.  Then  execute  the  program 
passing  the  necessary  options  and  file  on  the  command  line. 

On  UNIX  systems  such  as  Mac  OS  X,  use  the  following  command  in  Terminal  to  make  a  file  executable, 
chmod  a+x  qsearch.pl 

Then  use  the  following  command  from  within  the  folder  containing  qsearch.pl  to  execute  the  program  and 
return  the  version  in  Terminal. 

./qsearch. pi  — version 

Other  operating  sytems  follow  a  similiar  proceedure  to  impart  executability.  See  Google  or  your 
neighboorhood  programmer  for  more  syntax  details.  :-) 

This  program  was  written  and  tested  in  PERL  version  5.18.2  on  a  Mac  but  should  be  backward  compatible 
with  previous  PERL  versions  on  other  operating  systems. 

VERSION 

2.0 

PROGRAMMER 

Dr.  Jason  L.  Sormenberg,  <sonnenberg.ll@osu.edu> 

COPYRIGHT 

This  utility  is  free  software;  you  can  redistribute  it  and/or  modify  it  under  the  same  terms  as  PERL  itself.  I 
would  like  to  hear  of  any  suggestions  for  improvement. 
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Appendix  4:  Supplemental  File  S4  -  Proteins  identified  by  meta-analysis  of  Htt-polyQ  associated  protein  studies  in  yeast. 


Supplemental  File  4:  Overlapping  Htt-polyQ  aggregate-associated  proteins  between  TAPI, 
overlapping  protein  hits  including  ID  domains  and  molecular  function. 


SILAC  (Park  2013)  and  2D  gel  analysis  (Wang  2008)  in  yeast.  Bio 
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Bmhlp 
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BMH1 

2  YDR171W 

HSP42 

HSP42 
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Pgki 
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Pin3p 

PIN3 

PIN3 

5  YOR007C 

Sgt2p 

SGT2 

SGT2 

6  YNL007C 

Sislp 

SIS1 

SIS1 

7  YAL005C 

SSA1 

SSA1 

SSA1 

8  YLL024C 

SSA2 

SSA2 

9  YNL208W 

YNL208W 

YNL208W 

YNL208W 

10  YFL039C 

Actl 

ACT1 

11  YDR099W 

Bmh2p 

BMH2 

12  YAL021C 

Ccr4p 

CCR4 

13  YBR112C 

Cyc8p 

CYC8 

14  YKL054C 

Deflp 

DEF1 

15  YDL161W 

Entlp 

ENT1 

16  YDL226C 

17  YLL026W 

Gcslp 

HSP104 

GCS1 

18  YKL032C 

Ixrlp 

IXR1 

19  YOR197W 

Mcalp 

MCA1 

20  YPL190C 

Nab3p 

NAB3 

21  YKL068W 

NuplOOp 

NUP100 

22  YMR047C 

Nup116p 

NUP116 

23  YIR006C 

Panlp 

PAN1 

24  YGR178C 

Pbplp 

PBP1 

25  YDL053C 

Pbp4p 

PBP4 

26  YNL016W 

Publp 

PUB1 

27  YLL013C 

Puf3p 

PUF3 

28  YGL014W 

Puf4p 

PUF4 

29  YCL028W 

Rnqlp 

RNQ1 

30  YIL109C 

Sec24p 

SEC24 

31  YBL007C 

Slalp 

SLA1 

32  YHR030C 

Slt2p 

SLT2 

33  YPR088C 

Srp54p 

SRP54 

34  YDR172W 

Sup35p 

SUP35 

35  YBR198C 

Taf5p 

TAF5 

36  YNL064C 

Ydjlp 

YDJ1 

37  YGR250C 

YGR250C 

YGR250C 

38  YMR124W 

YMR124W 
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Mason  2013  Willingham  2003  Kayatekin  2014  Giorgini  2005 


Protein  Function 
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14-3-3  protein,  major  isoform  YER177W 
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YCR012W 
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Glutamine-rich  cytoplasmic  cochape  YOR007C 

YNL007C 

YAL005C 

YLL024C 

Protein  of  unknown  function;  may  ini  YNL208W 

YFL039C 

YDR099W 

CCR4-NOT  is  involved  in  regulation  of  c  YAL021 C 
General  transcriptional  co-repressor;  ca  YBR112C 
RNAPII  degradation  factor  YKL054C 

Epsin-like  protein  involved  in  endocytos  YDL161W 
ADP-ribosylation  factor  GTPase  activati  YDL226C 

YLL026W 

Transcriptional  repressor  that  regulates  YKL032C 
Ca2+-dependent  cysteine  protease;  req  YOR197W 
RNA-binding  protein,  subunit  of  Nrdl  cc  YPL190C 
FG-nucleoporin  component  of  central  ct  YKL068W 
FG-nucleoporin  component  of  central  ct  YMR047C 
Part  of  actin  cytoskeleton-regulatory  cor  YIR006C 
Component  of  glucose  deprivation  indue  YGR178C 
Pbplp  binding  protein;  interacts  strongl;  YDL053C 
Poly  (A)+  RNA-binding  protein;  abundar  YNL016W 
mRNA-binding  protein:  Protein  of  the  miYLL013C 
preferentially  binds  mRNAs  encoding  niYGL014W 

YCL028W 

Component  of  the  Sec23p-Sec24p  hete  YIL109C 
Cytoskeletal  protein  binding  protein;  reqYBL007C 
Serine/threonine  MAP  kinase;  involved  IYHR030C 
Signal  recognition  particle  (SRP)  subuni  YPR088C 

YDR172W 

Subunit  (90  kDa)  of  TFIID  and  SAGAcoYBR198C 
Type  I  HSP40  co-chaperone;  involved  irYNL064C 
Putative  RNA  binding  protein;  localizes  YGR250C 
Protein  involved  in  septin-ER  tethering;  YMR124W 
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23.17 
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6.36 
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12.8 
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4.84 

6.26 
6.67 
9.5 

7.39 

4.39 
3.42 
5.38 
8.27 
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physical  characterization  of 
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Y 
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Y 
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Y 

None 

41 

Y 
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Y 
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966 
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371 

Y 

40,  172,  371 
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387,  59 

387 
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Y 
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Y 

597 

158,  199,  106 
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Y 
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71 
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Y 

71 
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Y 
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798 
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134,  56 
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Y 

134,  56 
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Y 
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Y 
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Y 
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